Mailing List Archive: Associative Data Store Re:(3)

[REBOL] Associative Data Store Re:(3)

From: joel:neely:fedex at: 18-Sep-2000 18:27


[rebol--keithdevens--com] wrote:
> I'm probably going to get beaten up for this :), but if my question is
> totally absurd, please excuse my ignorance...
>

The only absurd questions are those that are asked out of malice or not
asked out of timidity.  ;-)  I may have strong opinions on what I'd like
to be able to do in my own programs, but certainly don't want that to
come across as negative value judgements on anyone who wishes to write
in a different style.

> You seem to put a lot of emphasis on being able to use block!s as keys. Is
> this really necessary or desired?
>

Well, in terms of strictly "necessary", we could just write everything
in
tcl, where all data are strings (even data structures).  However, REBOL
gives us this really nice collection of various data types.  Why NOT be
able to use them?  As the only thing really necessary for looking up a
key is to do an equality test, why shouldn't keys be able to come from
any data type for which the concept of "equal" makes sense?

Also, as one of my co-workers pointed out to me, a  block!  is a generic
container into which I'm allowed to put any kind of REBOL data I wish.
If  find  (for example) is defined to work on blocks, then why shouldn't
it work on blocks that contain anything blocks are allowed to contain
(such as other blocks)?

In terms of "desired", I certainly desire that capability!
Let me give an example:

Suppose I have a file of data on school children, which includes age and
grade level for each student.  Now suppose I'm asked to cross-tabulate
age and grade level for some report.  This means I need a set of
counters,
where each one corresponds to a specific combination of age and grade.
It
would be very pleasant, IMHO, to be able to write something like the
following sketch:

    age-grade-tab: assoc/new
    foreach studentrecord read/lines %studentdemographics.data [
        age:   get-age studentrecord
        grade: get-grade studentrecord
        a-g:   reduce [age grade]     ; the key IS the combination!
        age-grade-tab/put  a-g  1 + any [age-grade-tab/get a-g 0]
    ]

Now someone may ask why I didn't just use a two-dimensional array, with
age and grade as subscripts.  A couple of reasons immediately come to
mind:

0) A counter is identified by a combination of two facts: age and
   grade.  REBOL provides me with a handy datatype for representing
   compound values: the block!  Why shouldn't I use it?

1) Such an array would be very sparse!  As most of the values are
   clustered along a narrow diagonal band, it would be nice to limit
   my storage requirements to age/grade combinations that actually
   exist.  (Please remember that this is a small example of a more
   general concern.  I'm only illustrating the fact that arrays are
   often wildly space-inefficient; I'm sure you can all come up with
   many examples that are even more extreme, such as weight in grams
   versus height in centimeters, etc.)

2) Who says that grade is an integer?  Perhaps the coding scheme is
   that "K" is kindergarten, 1-8 are grade numbers, "Fr" "So" "Jr"
   "Sr" are the last four levels of high school, "GED" high-school
   equivalent coursework for adults, and "AE" is adult education
   courses for high school diploma holders.  In this case we not only
   can't use the grade as a numerical index, but now our ranges of
   values for both age and grade have become much larger, greatly
   aggravating the sparse-array problem above.

3) Notice that the code sketch above is unaffected by changes in
   these issues.  If I originally wrote the code above only for
   an elementary school (ages 6-11, grades 1-6) and was suddenly
   asked to produce the same report for an entire traditional system
   (ages 4-18, grades 1-12) or a non-traditional school (with the
   non-numeric grade codings and ages 4-90), the above code wouldn't
   break.  Isn't writing generalized code that doesn't break a good
   thing to do?

Turning the output side of the report, I can start with something as
simple as

    foreach k age-grade-tab/keys [
        print [k ":" age-grade-tab/get k]
    ]

which works for all of the above cases.  However, if I need to provide
a more orderly report, I'll probably write something like

    foreach k sort/compare age-grade-tab/keys age-grade-sorter [
        ; with some nice layout tricks not relevant to this point
    ]

If there is simply a range expansion (grades 1-12 instead of 1-6,
with ages 4-18 instead of 6-11) the above code still works (ignoring
layout issues).  If we go to the mixed coding of (2) above, all I
have to do is go to one function  age-grade-sorter  and adjust it
appropriately.  All of the rest of the code still works.

I VERY much like the idea of encapsulating each policy decision in
exactly one place in the code.  The idea of having to transform data
back and forth between a natural representation and an agglomerated
string throughout my program (just because some language feature has
a bias in favor of strings) strikes me as very ugly, as well as a
fruitful source of coding and maintenance errors.

I could make up more examples, but I hope the single one above is
adequate to explain the "flavor" of the concept without more belaboring.

> It seems like an implementation would be easier and/or more efficient if we
> didn't have to worry about that case. Same for logic! values :)
>

What worry?  REBOL already knows how to take two values and compare
them for equality.  The  block!  and  logic!  types just happen to
be two examples of that more general principle.

What if I wanted to tally up the birthdates of the students?  Isn't
it meaningful to use a  date!  key for a count of how many students
had a birthday on that date?  REBOL already can do that, and equality
comparison on  date!  values is much more complex that testing
 logic!  values for equality.

> Many other languages get by with just strings for keys, ...
>

Many other languages "get by" without MOST of the nicer features of
REBOL.  If all I wanted was a language which could "get by", I'd
keep writing in C.  REBOL has been presented as a high-level language
which can do make use of a wide range of user-friendly data types.

I think it is unfriendly to have arbitrary restrictions on which of
them can be used in some situations.

> ...why do we have to be able to use every Rebol data type as a key?
>

Anyone who wants to use only strings as keys, and can get his/her
work done effectively under that restriction has my blessing.  I'm
not trying to force anyone to use any feature they don't see a need
for.

On the other hand, I'll be sad if the hypothetical person in the
previous paragraph tells me that I CAN'T use any other type of key,
even when I can see good reason to do so in my own code.

-jn-