Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Associative Data Store Re:(3)

From: joel:neely:fedex at: 18-Sep-2000 18:27

[rebol--keithdevens--com] wrote:
> I'm probably going to get beaten up for this :), but if my question is > totally absurd, please excuse my ignorance... >
The only absurd questions are those that are asked out of malice or not asked out of timidity. ;-) I may have strong opinions on what I'd like to be able to do in my own programs, but certainly don't want that to come across as negative value judgements on anyone who wishes to write in a different style.
> You seem to put a lot of emphasis on being able to use block!s as keys. Is > this really necessary or desired? >
Well, in terms of strictly "necessary", we could just write everything in tcl, where all data are strings (even data structures). However, REBOL gives us this really nice collection of various data types. Why NOT be able to use them? As the only thing really necessary for looking up a key is to do an equality test, why shouldn't keys be able to come from any data type for which the concept of "equal" makes sense? Also, as one of my co-workers pointed out to me, a block! is a generic container into which I'm allowed to put any kind of REBOL data I wish. If find (for example) is defined to work on blocks, then why shouldn't it work on blocks that contain anything blocks are allowed to contain (such as other blocks)? In terms of "desired", I certainly desire that capability! Let me give an example: Suppose I have a file of data on school children, which includes age and grade level for each student. Now suppose I'm asked to cross-tabulate age and grade level for some report. This means I need a set of counters, where each one corresponds to a specific combination of age and grade. It would be very pleasant, IMHO, to be able to write something like the following sketch: age-grade-tab: assoc/new foreach studentrecord read/lines %studentdemographics.data [ age: get-age studentrecord grade: get-grade studentrecord a-g: reduce [age grade] ; the key IS the combination! age-grade-tab/put a-g 1 + any [age-grade-tab/get a-g 0] ] Now someone may ask why I didn't just use a two-dimensional array, with age and grade as subscripts. A couple of reasons immediately come to mind: 0) A counter is identified by a combination of two facts: age and grade. REBOL provides me with a handy datatype for representing compound values: the block! Why shouldn't I use it? 1) Such an array would be very sparse! As most of the values are clustered along a narrow diagonal band, it would be nice to limit my storage requirements to age/grade combinations that actually exist. (Please remember that this is a small example of a more general concern. I'm only illustrating the fact that arrays are often wildly space-inefficient; I'm sure you can all come up with many examples that are even more extreme, such as weight in grams versus height in centimeters, etc.) 2) Who says that grade is an integer? Perhaps the coding scheme is that "K" is kindergarten, 1-8 are grade numbers, "Fr" "So" "Jr" "Sr" are the last four levels of high school, "GED" high-school equivalent coursework for adults, and "AE" is adult education courses for high school diploma holders. In this case we not only can't use the grade as a numerical index, but now our ranges of values for both age and grade have become much larger, greatly aggravating the sparse-array problem above. 3) Notice that the code sketch above is unaffected by changes in these issues. If I originally wrote the code above only for an elementary school (ages 6-11, grades 1-6) and was suddenly asked to produce the same report for an entire traditional system (ages 4-18, grades 1-12) or a non-traditional school (with the non-numeric grade codings and ages 4-90), the above code wouldn't break. Isn't writing generalized code that doesn't break a good thing to do? Turning the output side of the report, I can start with something as simple as foreach k age-grade-tab/keys [ print [k ":" age-grade-tab/get k] ] which works for all of the above cases. However, if I need to provide a more orderly report, I'll probably write something like foreach k sort/compare age-grade-tab/keys age-grade-sorter [ ; with some nice layout tricks not relevant to this point ] If there is simply a range expansion (grades 1-12 instead of 1-6, with ages 4-18 instead of 6-11) the above code still works (ignoring layout issues). If we go to the mixed coding of (2) above, all I have to do is go to one function age-grade-sorter and adjust it appropriately. All of the rest of the code still works. I VERY much like the idea of encapsulating each policy decision in exactly one place in the code. The idea of having to transform data back and forth between a natural representation and an agglomerated string throughout my program (just because some language feature has a bias in favor of strings) strikes me as very ugly, as well as a fruitful source of coding and maintenance errors. I could make up more examples, but I hope the single one above is adequate to explain the "flavor" of the concept without more belaboring.
> It seems like an implementation would be easier and/or more efficient if we > didn't have to worry about that case. Same for logic! values :) >
What worry? REBOL already knows how to take two values and compare them for equality. The block! and logic! types just happen to be two examples of that more general principle. What if I wanted to tally up the birthdates of the students? Isn't it meaningful to use a date! key for a count of how many students had a birthday on that date? REBOL already can do that, and equality comparison on date! values is much more complex that testing logic! values for equality.
> Many other languages get by with just strings for keys, ... >
Many other languages "get by" without MOST of the nicer features of REBOL. If all I wanted was a language which could "get by", I'd keep writing in C. REBOL has been presented as a high-level language which can do make use of a wide range of user-friendly data types. I think it is unfriendly to have arbitrary restrictions on which of them can be used in some situations.
> ...why do we have to be able to use every Rebol data type as a key? >
Anyone who wants to use only strings as keys, and can get his/her work done effectively under that restriction has my blessing. I'm not trying to force anyone to use any feature they don't see a need for. On the other hand, I'll be sad if the hypothetical person in the previous paragraph tells me that I CAN'T use any other type of key, even when I can see good reason to do so in my own code. -jn-