Mailing List Archive: Re: On list! inconsistency [was Re: WYSIWYG programming]

[REBOL] Re: On list! inconsistency [was Re: WYSIWYG programming]

From: joel:neely:fedex at: 2-Nov-2000 7:44


Hi, Rishi,

[rebol-bounce--rebol--com] wrote:
> Speaking of lists, I was wondering why it is even considered a
> series datatype.

Because it fulfills the defining concepts of a series:

1)  a definite position in
2)  an ordered collection of values

There's a reason for having block!, hash!, and list! all in the
language -- performance.

[Disclaimer:  I don't work at RT, nor do I play one on TV!
 Speculation, educated guesses, and unindicted co-rumors ahead.]

There are many ways to implement a "collection" structure; three
that are widely used are:

1)  an array -- a consecutive set of memory "cells" that contain
    the values of interest (or references to them -- we don't need
    to worry about that distinction here) in the proper order.

2)  a linked list -- a bunch of individual "nodes", each of which
    contains a single datum and references to its predecessor
    (except the first) and successor (except the last) in the
    collection.

3)  a hash table -- some combination of the two previous physical
    strategies, along with an index (in the back-of-the-book sense)
    that is calculated from the content data and tells where the
    values are located.

Keeping to the "economic model" of software, each of these has some
costs, and each has some benefits:

Costs:

    Array - Insertions and deletions require copying/moving chunks
            of content around.  If insufficient space is reserved,
    insertion may require allocation of new memory areas, garbage
    collection, mass copying, handle managment, etc.  Searching
    (in general) requires a linear algorithm whose time is
    proportional to the length of the array.

    Linked - Extra memory is required for the forward/backward
    List     references.  Searching requires traversing the links
             via a linear algorithm.  Random access (i.e., to a
    specific position of the list) requires stepping through the
    links as well, with a time proportional to the "distance"
    between the prior position and the new position.

    Hash  - More storage is required for the index, and computation
    Table   time is required to keep it current with each insertion
            or deletion.  Other issues vary, depending on the way
    the content data (and index) are physically structured.

Benefits:

    Array - Simplicity.  Random access is constant time.  If the
            space can be pre-allocated, and the array is "grown"
    only by appending, the shuffling of data can be minimized.

    Linked - Insertion and deletion (at the current position!) can
    List     be done in constant time.  If the primary usage is
             scanning sequentially, the time overhead of accessing
    consecutive positions is be minimized.

    Hash  -  Searching can be done in nearly constant time.  If
    Table    the physical storage strategy is "tuned" for it,
             consecutive and random access may not be too costly.

So...  it's A Good Thing that REBOL gives us the ability to choose
the "flavor" of container that is appropriate for each task.  As
for the rest of your comments...

> About half of the series functions (such as 'reverse, etc..)
> either don't work, or work differently than other series datatypes.
> I would think since lists are different from other series datatypes,
> it makes more sense for it to be it's own datatype with it's own
> special functions such as...

If the differences create costs with no apparent benefit (which is
exactly the question I have been raising), then I suggest that the
Right Thing To Do is not further fragmentation of concepts, but
rather the rationalization of existing concepts into a consistent
interface.

Believe me, I know what a linked list is and how to build one (and
have done so many times...), and there's no reason why a linked
list can't be reversed.  But...

    >> boo: [3 1 1 2 6 5 3 5]   == [3 1 1 2 6 5 3 5]
    >> loo: to-list boo         == make list! [3 1 1 2 6 5 3 5]
    >> insert skip boo 2 4      == [1 2 6 5 3 5]
    >> boo                      == [3 1 4 1 2 6 5 3 5]
    >> insert skip loo 2 4      == make list! [1 2 6 5 3 5]
    >> loo                      == make list! [3 1 4 1 2 6 5 3 5]
    >> head reverse copy boo    == [5 3 5 6 2 1 4 1 3]
    >> head reverse copy loo    == make list! [3 1 4 1 2 6 5 3 5]

Hence my question of why REBOL can't simply give us a consistent
interface to specialized collections that still have their own
special strengths.  (It's called polymorphism...)

-jn-