hash questions

[1/16] from: dlhawley:attbi at: 7-Jan-2002 21:16

I come from the perl, tcl world where hashes aremoe like array indicies. In rebol, it seems that I can build a hash list, of some arbitray values, but cannot associate keys with data. Am I missing soemthing? is in perl $hash{ "joe" } = $joesData and $data = $hash{ " joe" } can I do something like this in REBOL?

[2/16] from: joel::neely::fedex::com at: 8-Jan-2002 6:27

Hi, David, David Hawley wrote:

> I come from the perl, tcl world where hashes are more like > array indicies. In rebol, it seems that I can build a hash

<<quoted lines omitted: 3>>

> and $data = $hash{ " joe" } > can I do something like this in REBOL?

You can do this in REBOL: favoriteLanguage: [ "Larry" "Perl" "Carl" "REBOL" "Guido" "Python" "Matz" "Ruby" "Richard" "LISP" ] and then

>> print select favoriteLanguage"Guido"

Python However, if you want to maintain a dynamic set of associations it's a bit more work: assoc: make object! [ data: [] default: 0 clear: func [] [data: copy []] store: func [k [string!] v [string!] /local pos] [ pos: find/skip data k 2 either found? pos [ change next pos v] [ append append data k v] ] fetch: func [k [string!] /local r] [ either found? r: select/skip data 2 k [r] [default]] ] used as follows:

>> favlang: make assoc [default: "INTERCAL"] >> favlang/fetch "David"

== "INTERCAL"

>> favlang/store "David" "REBOL"

== ["David" "REBOL"]

>> favlang/store "Bjarne" "c++"

== ["David" "REBOL" "Bjarne" "c++"]

>> favlang/store "Edsger" "Mathematics"

== ["David" "REBOL" "Bjarne" "c++" "Edsger" "Mathematics"]

>> favlang/fetch "David"

== "REBOL"

>> favlang/fetch "Edsger"

== "Mathematics" Both blocks and hashes can be used with SELECT and FIND (and other access/manipulation functions as well). Hashes use a hash-coding scheme (internally) to speed the searching. You may be wondering about the /SKIP above. The reason is that REBOL by default treats blocks and hashes as ordered collection of *single*values* which means that one could also say

>> select favlang/data "REBOL"

== "Bjarne" to conclude that REBOL's favorite programming language is Bjarne! The /SKIP refinement tells FIND, SELECT, et al, to treat the first argument as a collection of n-tuples, where n is the last argument. When we say SELECT/SKIP block val 2 we're asking SELECT to consider only the first value of every *pair* of entries, so that we get

>> select/skip favlang/data "Edsger" 2

== ["Mathematics"]

>> select/skip favlang/data "REBOL" 2

== none which is more likely what you'd expect, coming from Perl. You can, of course, use other tuple sizes than 2, as in guruData: [ "Edsger" "Dijkstra" "UT Austin" "Don" "Knuth" "Stanford" "Tony" "Hoare" "Oxford" "David" "Gries" "Cornell" ] which then can be treated as triples of data via

>> select/skip guruData "Tony" 3

== ["Hoare" "Oxford"] (Notice that the remainder of the triple is returned as a block.) Even though

>> find/skip guruData "Don" 3

== [ "Don" "Knuth" "Stanford" "Tony" "Hoare" "Oxford" "David" "Gries" "Cornell" ] succeeds, notice that

>> find/skip guruData "Oxford" 3

== none fails, since "Oxford" doesn't *begin* any triple of values. HTH! -jn- -- ; sub REBOL {}; sub head ($) {@_[0]} REBOL [] # despam: func [e] [replace replace/all e ":" "." "#" "@"] ; sub despam {my ($e) = @_; $e =~ tr/:#/.@/; return "\n$e"} print head reverse despam "moc:xedef#yleen:leoj" ;

[3/16] from: joel:neely:fedex at: 8-Jan-2002 6:32

Arrrrghhhh! CUT/PASTE TYPO WARNING! Joel Neely wrote:

> assoc: make object! > [ data: []

<<quoted lines omitted: 8>>

> fetch: func [k [string!] /local r] > [ either found? r: select/skip data 2 k [r] [default]]

[ either found? r: select/skip data k 2 [r] [default]]

> ] >

My apologies for the error! -JN- -- ; sub REBOL {}; sub head ($) {@_[0]} REBOL [] # despam: func [e] [replace replace/all e ":" "." "#" "@"] ; sub despam {my ($e) = @_; $e =~ tr/:#/.@/; return "\n$e"} print head reverse despam "moc:xedef#yleen:leoj" ;

[4/16] from: brett:codeconscious at: 8-Jan-2002 23:48

> You can do this in REBOL:

Thanks Joel. Another gem. You just demonstrated to me how woefully inadequate my knowledge */skip was. Brett.

[5/16] from: robert:muench:robertmuench at: 8-Jan-2002 13:58

> -----Original Message----- > From: [rebol-bounce--rebol--com] [mailto:[rebol-bounce--rebol--com]]On Behalf Of

<<quoted lines omitted: 3>>

> Subject: [REBOL] Re: hash questions > I come from the perl, tcl world where hashes aremoe like array indicies.

Hi, that's what I have in mind too when thinking about hash datastructure.

> In rebol, it seems that I can build a hash list, of some arbitray > values, but cannot associate keys with data. Am I missing soemthing?

If I understand the hash! in Rebol correct, it's a special datastructure supporting fast searched. Everything you insert can be searched for with find. I assume that find uses the best seach-algorithm depending on the datastructure it searches on (can someone from RT validate this?). So now to use your idea about (key,value). This can be done and it's only a thing you have to change in mind. Just insert your key value pait into the hash. If you now search for the key your value will be the next entry in the hash! ;-)) myhash: hash! [key1 value1 key2 value2] value1 can be found with: next find myhash 'key1 -- Robert M. Munch IT & Management Freelancer Mobile: +49 (0)177 2452 802 Fax : +49 (0)721 8408 9112 Web : http://www.robertmuench.de

[6/16] from: hallvard:ystad:helpinhand at: 8-Jan-2002 14:06

Dixit Joel Neely (13.27 08.01.2002):

>The /SKIP refinement tells FIND, SELECT, et al, to treat the >first argument as a collection of n-tuples, where n is the last >argument.

Hey! That's interesting!

>You >can, of course, use other tuple sizes than 2,

[...]

>Even though > >> find/skip guruData "Don" 3

<<quoted lines omitted: 7>>

> == none >fails, since "Oxford" doesn't *begin* any triple of values.

Thanks, Joel! I wasn't aware of this... effect of /skip, but it'll help a lot. ~H

[7/16] from: robert:muench:robertmuench at: 8-Jan-2002 15:15

> -----Original Message----- > From: [rebol-bounce--rebol--com] [mailto:[rebol-bounce--rebol--com]]On Behalf Of

<<quoted lines omitted: 4>>

> Thanks Joel. Another gem. > You just demonstrated to me how woefully inadequate my knowledge */skip was.

Hi, same with me didn't knew about it too as you can see in my other post :-| Robert

[8/16] from: sunandadh:aol at: 8-Jan-2002 10:12

Hi Robert,

> myhash: hash! [key1 value1 key2 value2] > value1 can be found with: next find myhash 'key1

A quick correction, and then a possibly useful tweak. Correction: You need Make Hash! Tweak. Consider using Select. Find positions in the series. Select returns the item itself, not a block position myhash: MAKE hash! [key1 value1 key2 value2] Got-it: next find myhash 'key1 print [type? got-it mold Got-it] Got-it: select myhash 'key1 print [type? got-it mold Got-it] And, as Joel rightly points out, remember the /Skip refinement works on Select too. Sunanda.

[9/16] from: rotenca:telvia:it at: 8-Jan-2002 17:08

Hi David, Joel and all

> You can do this in REBOL: > favoriteLanguage:

<<quoted lines omitted: 6>>

> and then > >> print select favoriteLanguage"Guido"

You can do also: name: "Guido" print favoriteLanguage/:name The path evaluator make an implict select on block and hash. But the problem here is that if a language called "Guido" appears before the user "Guido" then fails, because the path evaluator doesn't skip anything so it can be used if datatype are different, for example: word! - string! or string! - file! or file! - block! string! integer! and so on. A fetch based on this: fetch: func [k [string!]] [data/:k ] but if k is not found fetch will fire an error (Invalid path value). It is possible to do also a change, if the index is a word (it seems to me a strange thing - what do you think?): data: [guido "Rebol"] change: func [k [word!] value] [k: to-set-word k data/:k :value] change 'guido "What else?"

> However, if you want to maintain a dynamic set of associations > it's a bit more work:

<<quoted lines omitted: 10>>

> fetch: func [k [string!] /local r] > [ either found? r: select/skip data 2 k [r] [default]]

--- Ciao Romano

[10/16] from: joel:neely:fedex at: 8-Jan-2002 12:37

Hi, Romano, Romano Paolo Tenca wrote:

> You can do also: > > name: "Guido" > print favoriteLanguage/:name > > The path evaluator make an implict select on block and hash. > But the problem here is that if a language called "Guido" appears > before the user "Guido" then fails... >

Given

>> name: "Perl"

== "Perl"

>> favoriteLanguage/:name

== "Carl" I think I'd say "gives an unexpected result" rather than "fails", since there's not explicit error involved.

> A fetch based on this: > fetch: func [k [string!]] [data/:k ]

<<quoted lines omitted: 4>>

> change: func [k [word!] value] [k: to-set-word k data/:k :value] > change 'guido "What else?"

There are a number of "strange things" in the use of paths; using classSize: [ "freshman" 166 "sophomore" 141 "junior" 122 "senior" 131 ] we can say

>> select classSize "junior"

== 122

>> select classSize "graduate"

== none compared with

>> x: "junior" classSize/:x

== 122

>> x: "graduate" classSize/:x

** Script Error: Invalid path value: graduate ** Where: halt-view ** Near: classSize/:x But when using classSize: [ 9 166 10 141 11 122 12 131 ] we find

>> select classSize 7

== none

>> select classSize 9

== 166 compared with

>> x: 7 classSize/:x

== 12

>> x: 9 classSize/:x

== none and

>> pick classSize 7

== 12

>> pick classSize 9

== none I don't understand the purpose of the inconsistencies: 1) when to generate an error vs. returning NONE to indicate data-not-found, and 2) the fact that /: performs access-by-position or access-by-key depending on the type of the second word's value. Multiple ways to do not-quite-the-same-things seem to be an invitation to confusion. -jn-

[11/16] from: rotenca:telvia:it at: 8-Jan-2002 21:00

Hi, Joel

> There are a number of "strange things" in the use of paths; using > I don't understand the purpose of the inconsistencies: > > 1) when to generate an error vs. returning NONE to indicate > data-not-found, and

I agree. It is documented, but i do not like it.

> 2) the fact that /: performs access-by-position or access-by-key > depending on the type of the second word's value.

So we can end with:

>> x: 4

== 4

>> a/x

** Script Error: Invalid path value: x ** Where: do-boot ** Near: a/x

>> a/:x

== none but neither 'x neither 4 are in block. It is like the value was re-evaluated, like it was a refinement of the path itself. Something of recursive evalutation of the get-word in the path:

>>a/:s

is re-evaluated like it was

>>a/value-of-s

if value-of-s is integer or decimal (!) -> numeric offset word -> word for select BTW now i understand why the set-word value change the block:

>> a: [b 2]

== [b 2]

>> x: first [b:]

== b:

>> a/:x 3

== [b 3] is like:

>> a/b: 3

== [b 3] --- Ciao Romano

[12/16] from: larry:ecotope at: 8-Jan-2002 12:26

Hi David, Joel has already given an excellent overview of using the hash! datatype. I just want to mention an additional issue with the hash! type. AFAIK, Only strings and integers are hashed to provide fast search. So for a block which contains consecutive key-value pairs, converting the block to a hash will be a benefit only when the keys are either integers or strings. Does anyone know if other REBOL datatypes can be hashed? -Larry

[13/16] from: holger:rebol at: 8-Jan-2002 12:47

On Tue, Jan 08, 2002 at 12:26:24PM -0800, Larry Palmiter wrote:

> Hi David, > Joel has already given an excellent overview of using the hash! datatype. I

<<quoted lines omitted: 3>>

> hash will be a benefit only when the keys are either integers or strings. > Does anyone know if other REBOL datatypes can be hashed?

string, issue, binary, image, file, email, url, tag, word, set-word, get-word, lit-word, refinement, logic, integer, char, decimal, money, time, date, tuple, pair, object. A lot more than just integers and strings :). -- Holger Kruse [holger--rebol--com]

[14/16] from: larry:ecotope at: 8-Jan-2002 20:30

Hi Holger, Thanks for your response. It is good that hash works for so many REBOL datatypes. I note that with a few exceptions the hashable values are of type any-string, any-word, or the so-called simple types in which the data is included in the REBOL value. I have not had a chance to test all of these possiblities. However I question whether decimal keys in a block of consecutive key-value pairs are hashable, or more precisely, the benefit of using a hash with decimal keys. I ran the following test script: REBOL [] clk: func [][now/time/precise] ;set up blocks and hashes with block values ;and either integer or decimal keys ;measure the time to find the last key in the block b1: make block! 10000 b2: make block! 10000 repeat j 10000 [ append b1 j append/only b1 reduce [j] ] repeat j 10000 [ append b2 to-decimal j append/only b2 reduce [j] ] h1: to-hash b1 h2: to-hash b2 t1: clk loop 1000 [find b1 10000] t1: clk - t1 t2: clk loop 1000 [find b2 10000.0] t2: clk - t2 print ["Find integer key in block:" t1] print ["Find decimal key in block:" t2] t1: clk loop 1000 [find h1 10000] t1: clk - t1 t2: clk loop 1000 [find h2 10000.0] t2: clk - t2 print ["Find integer key in hash:" t1] print ["Find decimal key in hash:" t2] ;end of code Here are my results for the current versions of core and view (450 MHz PII): system/version 2.5.0.3.1 Find integer key in block: 0:00:02.04 Find decimal key in block: 0:00:13.4 Find integer key in hash: 0:00 Find decimal key in hash: 0:00:14.66 system/version 1.2.1.3.1 Find integer key in block: 0:00:01.92 Find decimal key in block: 0:00:15.27 Find integer key in hash: 0:00 Find decimal key in hash: 0:00:16.59 It is interesting that the search for a decimal key takes 6 to 8 times longer than for an integer key. I noticed that if I change the code so that the decimal search keys are of the form j / 3 instead of to-decimal j, and then search for 10000 / 3 the times are much better, especially for the hash: Find integer key in block: 0:00:02.03 Find decimal key in block: 0:00:09.23 Find integer key in hash: 0:00 Find decimal key in hash: 0:00:04.78 This suggests that comparing decimals which are equal to integers (1000 and 1000.0) requires some extra time. But this does not seem to be confirmed by a direct test:

>> a: 10000

== 10000

>> b: 10000.0

== 10000

>> c: 10000 / 3

== 3333.33333333333

>> t: clk loop 5'000'000 [equal? a a] clk - t

== 0:00:06.26

>> t: clk loop 5'000'000 [equal? b b] clk - t

== 0:00:09.89

>> t: clk loop 5'000'000 [equal? c c] clk - t

== 0:00:10.11

>From the original example with decimal keys equal to integers, it appears

that using hash with decimal keys actually takes more time for FIND than when using a block. But with integer keys the hash makes FIND so fast that the loop iterations have to go up by a factor of 100 inorder to get a measurable time. I also see this:

>> t: clk loop 1000 [find h2 10000.0] clk - t

== 0:00:14.01

>> t: clk loop 1000 [find b2 10000.0] clk - t

== 0:00:14.83

>> t: clk loop 1000 [find h2 5000.0] clk - t

== 0:00:14.11

>> t: clk loop 1000 [find b2 5000.0] clk - t

== 0:00:07.36 ;twice as fast as hash It does seem that the hashes with decimal keys take a time independent of the location of the search key in the block while FIND on a block searches linearly from the beginning and thus results in half the time to find a key half-way through the block. Constant time is certainly a hallmark of a hash, but the hash time for FIND is so slow for decimal keys equal to integers that searching for a random key will take on average almost twice as long as when using a block. For the more representative j / 3 case, the times for block and hash will be comparable for a random search key, while the worst case will be a factor of 2 better for the hash. I guess I don't fully understand REBOL hashes, particularly with respect to decimal keys. Of the datatypes you mention, which will fail to display the large speed increase we expect from a hash (and which we see for string and integer keys) as opposed to a block ? Any comments are welcome. -Larry

[15/16] from: dlhawley:attbi at: 8-Jan-2002 22:02

Thanks Joel and others. I modified Joel's code (below). changes: data: make hash![] -- also needed in clear append only appends one value - so I append the key, then the value Also I added a simple to/from-file persistance. Does anyone know whyREBOL has the asymmetry in in read/write? - fortunatly reduce fixes things up nicely. Here's what I'm looking at now: assoc: make object! [ data: make hash! [] default: none ; return for no key match file: none ; to/from-file clear: does [ data: make hash! [] ] store: func [ k v /loc pos] [ pos: find/skip data k 2 either found? pos [ change next pos v] [ append data k append/only data v ] ] fetch: func [ k /local r] [ either found? r: select/skip data k 2 [first r] [default]] to-file: does [ write file mold data ] from-file: func [][ data: reduce do read file ] ] assoc-test: func [] [ hd: make assoc [ file: %test.rdb ] hd/store "ob1" [ 1 2 3] hd/store "ob2" [ "text data" ] hd/store "ob3" make object! [ f1: "text data" f2: 24 ] print mold hd/fetch "ob1" hd/to-file hd/from-file print mold hd/fetch "ob1" print mold hd/fetch "ob2" print mold hd/fetch "ob3" ]

>> assoc-test

[1 2 3] [1 2 3] ["text data"] make object! [ f1: "text data" f2: 24 ]

[16/16] from: arolls:idatam:au at: 10-Jan-2002 15:05

Didn't read the thread, but possibly you want read/binary & write/binary ? Or save and load. Anton.

Notes

Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted