[REBOL] Re: On list! inconsistency [was Re: WYSIWYG programming]
From: joel:neely:fedex at: 2-Nov-2000 7:44
Hi, Rishi,
[rebol-bounce--rebol--com] wrote:
> Speaking of lists, I was wondering why it is even considered a
> series datatype.
Because it fulfills the defining concepts of a series:
1) a definite position in
2) an ordered collection of values
There's a reason for having block!, hash!, and list! all in the
language -- performance.
[Disclaimer: I don't work at RT, nor do I play one on TV!
Speculation, educated guesses, and unindicted co-rumors ahead.]
There are many ways to implement a "collection" structure; three
that are widely used are:
1) an array -- a consecutive set of memory "cells" that contain
the values of interest (or references to them -- we don't need
to worry about that distinction here) in the proper order.
2) a linked list -- a bunch of individual "nodes", each of which
contains a single datum and references to its predecessor
(except the first) and successor (except the last) in the
collection.
3) a hash table -- some combination of the two previous physical
strategies, along with an index (in the back-of-the-book sense)
that is calculated from the content data and tells where the
values are located.
Keeping to the "economic model" of software, each of these has some
costs, and each has some benefits:
Costs:
Array - Insertions and deletions require copying/moving chunks
of content around. If insufficient space is reserved,
insertion may require allocation of new memory areas, garbage
collection, mass copying, handle managment, etc. Searching
(in general) requires a linear algorithm whose time is
proportional to the length of the array.
Linked - Extra memory is required for the forward/backward
List references. Searching requires traversing the links
via a linear algorithm. Random access (i.e., to a
specific position of the list) requires stepping through the
links as well, with a time proportional to the "distance"
between the prior position and the new position.
Hash - More storage is required for the index, and computation
Table time is required to keep it current with each insertion
or deletion. Other issues vary, depending on the way
the content data (and index) are physically structured.
Benefits:
Array - Simplicity. Random access is constant time. If the
space can be pre-allocated, and the array is "grown"
only by appending, the shuffling of data can be minimized.
Linked - Insertion and deletion (at the current position!) can
List be done in constant time. If the primary usage is
scanning sequentially, the time overhead of accessing
consecutive positions is be minimized.
Hash - Searching can be done in nearly constant time. If
Table the physical storage strategy is "tuned" for it,
consecutive and random access may not be too costly.
So... it's A Good Thing that REBOL gives us the ability to choose
the "flavor" of container that is appropriate for each task. As
for the rest of your comments...
> About half of the series functions (such as 'reverse, etc..)
> either don't work, or work differently than other series datatypes.
> I would think since lists are different from other series datatypes,
> it makes more sense for it to be it's own datatype with it's own
> special functions such as...
If the differences create costs with no apparent benefit (which is
exactly the question I have been raising), then I suggest that the
Right Thing To Do is not further fragmentation of concepts, but
rather the rationalization of existing concepts into a consistent
interface.
Believe me, I know what a linked list is and how to build one (and
have done so many times...), and there's no reason why a linked
list can't be reversed. But...
>> boo: [3 1 1 2 6 5 3 5] == [3 1 1 2 6 5 3 5]
>> loo: to-list boo == make list! [3 1 1 2 6 5 3 5]
>> insert skip boo 2 4 == [1 2 6 5 3 5]
>> boo == [3 1 4 1 2 6 5 3 5]
>> insert skip loo 2 4 == make list! [1 2 6 5 3 5]
>> loo == make list! [3 1 4 1 2 6 5 3 5]
>> head reverse copy boo == [5 3 5 6 2 1 4 1 3]
>> head reverse copy loo == make list! [3 1 4 1 2 6 5 3 5]
Hence my question of why REBOL can't simply give us a consistent
interface to specialized collections that still have their own
special strengths. (It's called polymorphism...)
-jn-