hash questions
[1/16] from: dlhawley:attbi at: 7-Jan-2002 21:16
I come from the perl, tcl world where hashes aremoe like array indicies.
In rebol, it seems that I can build a hash list, of some arbitray
values, but cannot associate keys with data. Am I missing soemthing?
is in perl $hash{ "joe" } = $joesData
and $data = $hash{ " joe" }
can I do something like this in REBOL?
[2/16] from: joel::neely::fedex::com at: 8-Jan-2002 6:27
Hi, David,
David Hawley wrote:
> I come from the perl, tcl world where hashes are more like
> array indicies. In rebol, it seems that I can build a hash
<<quoted lines omitted: 3>>
> and $data = $hash{ " joe" }
> can I do something like this in REBOL?
You can do this in REBOL:
favoriteLanguage:
[ "Larry" "Perl"
"Carl" "REBOL"
"Guido" "Python"
"Matz" "Ruby"
"Richard" "LISP"
]
and then
>> print select favoriteLanguage"Guido"
Python
However, if you want to maintain a dynamic set of associations
it's a bit more work:
assoc: make object!
[ data: []
default: 0
clear: func [] [data: copy []]
store: func [k [string!] v [string!] /local pos]
[ pos: find/skip data k 2
either found? pos
[ change next pos v]
[ append append data k v]
]
fetch: func [k [string!] /local r]
[ either found? r: select/skip data 2 k [r] [default]]
]
used as follows:
>> favlang: make assoc [default: "INTERCAL"]
>> favlang/fetch "David"
== "INTERCAL"
>> favlang/store "David" "REBOL"
== ["David" "REBOL"]
>> favlang/store "Bjarne" "c++"
== ["David" "REBOL" "Bjarne" "c++"]
>> favlang/store "Edsger" "Mathematics"
== ["David" "REBOL" "Bjarne" "c++" "Edsger" "Mathematics"]
>> favlang/fetch "David"
== "REBOL"
>> favlang/fetch "Edsger"
== "Mathematics"
Both blocks and hashes can be used with SELECT and FIND (and other
access/manipulation functions as well). Hashes use a hash-coding
scheme (internally) to speed the searching.
You may be wondering about the /SKIP above. The reason is that
REBOL by default treats blocks and hashes as ordered collection
of *single*values* which means that one could also say
>> select favlang/data "REBOL"
== "Bjarne"
to conclude that REBOL's favorite programming language is Bjarne!
The /SKIP refinement tells FIND, SELECT, et al, to treat the
first argument as a collection of n-tuples, where n is the last
argument. When we say SELECT/SKIP block val 2 we're asking
SELECT to consider only the first value of every *pair* of
entries, so that we get
>> select/skip favlang/data "Edsger" 2
== ["Mathematics"]
>> select/skip favlang/data "REBOL" 2
== none
which is more likely what you'd expect, coming from Perl. You
can, of course, use other tuple sizes than 2, as in
guruData:
[ "Edsger" "Dijkstra" "UT Austin"
"Don" "Knuth" "Stanford"
"Tony" "Hoare" "Oxford"
"David" "Gries" "Cornell"
]
which then can be treated as triples of data via
>> select/skip guruData "Tony" 3
== ["Hoare" "Oxford"]
(Notice that the remainder of the triple is returned as a block.)
Even though
>> find/skip guruData "Don" 3
== [
"Don" "Knuth" "Stanford"
"Tony" "Hoare" "Oxford"
"David" "Gries" "Cornell"
]
succeeds, notice that
>> find/skip guruData "Oxford" 3
== none
fails, since "Oxford" doesn't *begin* any triple of values.
HTH!
-jn-
--
; sub REBOL {}; sub head ($) {@_[0]}
REBOL []
# despam: func [e] [replace replace/all e ":" "." "#" "@"]
; sub despam {my ($e) = @_; $e =~ tr/:#/.@/; return "\n$e"}
print head reverse despam "moc:xedef#yleen:leoj" ;
[3/16] from: joel:neely:fedex at: 8-Jan-2002 6:32
Arrrrghhhh! CUT/PASTE TYPO WARNING!
Joel Neely wrote:
> assoc: make object!
> [ data: []
<<quoted lines omitted: 8>>
> fetch: func [k [string!] /local r]
> [ either found? r: select/skip data 2 k [r] [default]]
[ either found? r: select/skip data k 2 [r] [default]]
> ]
>
My apologies for the error!
-JN-
--
; sub REBOL {}; sub head ($) {@_[0]}
REBOL []
# despam: func [e] [replace replace/all e ":" "." "#" "@"]
; sub despam {my ($e) = @_; $e =~ tr/:#/.@/; return "\n$e"}
print head reverse despam "moc:xedef#yleen:leoj" ;
[4/16] from: brett:codeconscious at: 8-Jan-2002 23:48
> You can do this in REBOL:
Thanks Joel. Another gem.
You just demonstrated to me how woefully inadequate my knowledge */skip was.
Brett.
[5/16] from: robert:muench:robertmuench at: 8-Jan-2002 13:58
> -----Original Message-----
> From: [rebol-bounce--rebol--com] [mailto:[rebol-bounce--rebol--com]]On Behalf Of
<<quoted lines omitted: 3>>
> Subject: [REBOL] Re: hash questions
> I come from the perl, tcl world where hashes aremoe like array indicies.
Hi, that's what I have in mind too when thinking about hash datastructure.
> In rebol, it seems that I can build a hash list, of some arbitray
> values, but cannot associate keys with data. Am I missing soemthing?
If I understand the hash! in Rebol correct, it's a special datastructure
supporting fast searched. Everything you insert can be searched for with find. I
assume that find uses the best seach-algorithm depending on the datastructure it
searches on (can someone from RT validate this?).
So now to use your idea about (key,value). This can be done and it's only a
thing you have to change in mind. Just insert your key value pait into the hash.
If you now search for the key your value will be the next entry in the hash!
;-))
myhash: hash! [key1 value1 key2 value2]
value1 can be found with: next find myhash 'key1
--
Robert M. Munch
IT & Management Freelancer
Mobile: +49 (0)177 2452 802
Fax : +49 (0)721 8408 9112
Web : http://www.robertmuench.de
[6/16] from: hallvard:ystad:helpinhand at: 8-Jan-2002 14:06
Dixit Joel Neely (13.27 08.01.2002):
>The /SKIP refinement tells FIND, SELECT, et al, to treat the
>first argument as a collection of n-tuples, where n is the last
>argument.
Hey! That's interesting!
>You
>can, of course, use other tuple sizes than 2,
[...]
>Even though
> >> find/skip guruData "Don" 3
<<quoted lines omitted: 7>>
> == none
>fails, since "Oxford" doesn't *begin* any triple of values.
Thanks, Joel! I wasn't aware of this... effect of /skip, but it'll help a lot.
~H
[7/16] from: robert:muench:robertmuench at: 8-Jan-2002 15:15
> -----Original Message-----
> From: [rebol-bounce--rebol--com] [mailto:[rebol-bounce--rebol--com]]On Behalf Of
<<quoted lines omitted: 4>>
> Thanks Joel. Another gem.
> You just demonstrated to me how woefully inadequate my knowledge */skip was.
Hi, same with me didn't knew about it too as you can see in my other post :-|
Robert
[8/16] from: sunandadh:aol at: 8-Jan-2002 10:12
Hi Robert,
> myhash: hash! [key1 value1 key2 value2]
> value1 can be found with: next find myhash 'key1
A quick correction, and then a possibly useful tweak.
Correction: You need Make Hash!
Tweak. Consider using Select. Find positions in the series. Select returns
the item itself, not a block position
myhash: MAKE hash! [key1 value1 key2 value2]
Got-it: next find myhash 'key1
print [type? got-it mold Got-it]
Got-it: select myhash 'key1
print [type? got-it mold Got-it]
And, as Joel rightly points out, remember the /Skip refinement works on
Select too.
Sunanda.
[9/16] from: rotenca:telvia:it at: 8-Jan-2002 17:08
Hi David, Joel and all
> You can do this in REBOL:
> favoriteLanguage:
<<quoted lines omitted: 6>>
> and then
> >> print select favoriteLanguage"Guido"
You can do also:
name: "Guido"
print favoriteLanguage/:name
The path evaluator make an implict select on block and hash.
But the problem here is that if a language called "Guido" appears before the
user "Guido" then fails, because the path evaluator doesn't skip anything so
it can be used if datatype are different, for example: word! - string! or
string! - file! or file! - block! string! integer! and so on.
A fetch based on this:
fetch: func [k [string!]] [data/:k ]
but if k is not found fetch will fire an error (Invalid path value).
It is possible to do also a change, if the index is a word (it seems to me a
strange thing - what do you think?):
data: [guido "Rebol"]
change: func [k [word!] value] [k: to-set-word k data/:k :value]
change 'guido "What else?"
> However, if you want to maintain a dynamic set of associations
> it's a bit more work:
<<quoted lines omitted: 10>>
> fetch: func [k [string!] /local r]
> [ either found? r: select/skip data 2 k [r] [default]]
---
Ciao
Romano
[10/16] from: joel:neely:fedex at: 8-Jan-2002 12:37
Hi, Romano,
Romano Paolo Tenca wrote:
> You can do also:
>
> name: "Guido"
> print favoriteLanguage/:name
>
> The path evaluator make an implict select on block and hash.
> But the problem here is that if a language called "Guido" appears
> before the user "Guido" then fails...
>
Given
>> name: "Perl"
== "Perl"
>> favoriteLanguage/:name
== "Carl"
I think I'd say "gives an unexpected result" rather than "fails",
since there's not explicit error involved.
> A fetch based on this:
> fetch: func [k [string!]] [data/:k ]
<<quoted lines omitted: 4>>
> change: func [k [word!] value] [k: to-set-word k data/:k :value]
> change 'guido "What else?"
There are a number of "strange things" in the use of paths; using
classSize:
[ "freshman" 166
"sophomore" 141
"junior" 122
"senior" 131
]
we can say
>> select classSize "junior"
== 122
>> select classSize "graduate"
== none
compared with
>> x: "junior" classSize/:x
== 122
>> x: "graduate" classSize/:x
** Script Error: Invalid path value: graduate
** Where: halt-view
** Near: classSize/:x
But when using
classSize:
[ 9 166
10 141
11 122
12 131
]
we find
>> select classSize 7
== none
>> select classSize 9
== 166
compared with
>> x: 7 classSize/:x
== 12
>> x: 9 classSize/:x
== none
and
>> pick classSize 7
== 12
>> pick classSize 9
== none
I don't understand the purpose of the inconsistencies:
1) when to generate an error vs. returning NONE to indicate
data-not-found, and
2) the fact that /: performs access-by-position or access-by-key
depending on the type of the second word's value.
Multiple ways to do not-quite-the-same-things seem to be an
invitation to confusion.
-jn-
[11/16] from: rotenca:telvia:it at: 8-Jan-2002 21:00
Hi, Joel
> There are a number of "strange things" in the use of paths; using
> I don't understand the purpose of the inconsistencies:
>
> 1) when to generate an error vs. returning NONE to indicate
> data-not-found, and
I agree. It is documented, but i do not like it.
> 2) the fact that /: performs access-by-position or access-by-key
> depending on the type of the second word's value.
So we can end with:
>> x: 4
== 4
>> a/x
** Script Error: Invalid path value: x
** Where: do-boot
** Near: a/x
>> a/:x
== none
but neither 'x neither 4 are in block.
It is like the value was re-evaluated, like it was a refinement of the path
itself. Something of recursive evalutation of the get-word in the path:
>>a/:s
is re-evaluated like it was
>>a/value-of-s
if value-of-s is
integer or decimal (!) -> numeric offset
word -> word for select
BTW now i understand why the set-word value change the block:
>> a: [b 2]
== [b 2]
>> x: first [b:]
== b:
>> a/:x 3
== [b 3]
is like:
>> a/b: 3
== [b 3]
---
Ciao
Romano
[12/16] from: larry:ecotope at: 8-Jan-2002 12:26
Hi David,
Joel has already given an excellent overview of using the hash! datatype. I
just want to mention an additional issue with the hash! type.
AFAIK, Only strings and integers are hashed to provide fast search. So for a
block which contains consecutive key-value pairs, converting the block to a
hash will be a benefit only when the keys are either integers or strings.
Does anyone know if other REBOL datatypes can be hashed?
-Larry
[13/16] from: holger:rebol at: 8-Jan-2002 12:47
On Tue, Jan 08, 2002 at 12:26:24PM -0800, Larry Palmiter wrote:
> Hi David,
> Joel has already given an excellent overview of using the hash! datatype. I
<<quoted lines omitted: 3>>
> hash will be a benefit only when the keys are either integers or strings.
> Does anyone know if other REBOL datatypes can be hashed?
string, issue, binary, image, file, email, url, tag, word, set-word,
get-word, lit-word, refinement, logic, integer, char, decimal, money,
time, date, tuple, pair, object.
A lot more than just integers and strings :).
--
Holger Kruse
[holger--rebol--com]
[14/16] from: larry:ecotope at: 8-Jan-2002 20:30
Hi Holger,
Thanks for your response. It is good that hash works for so many REBOL
datatypes. I note that with a few exceptions the hashable values are of type
any-string, any-word, or the so-called simple types in which the data is
included in the REBOL value.
I have not had a chance to test all of these possiblities. However I
question whether decimal keys in a block of consecutive key-value pairs are
hashable, or more precisely, the benefit of using a hash with decimal keys.
I ran the following test script:
REBOL []
clk: func [][now/time/precise]
;set up blocks and hashes with block values
;and either integer or decimal keys
;measure the time to find the last key in the block
b1: make block! 10000
b2: make block! 10000
repeat j 10000 [
append b1 j append/only b1 reduce [j]
]
repeat j 10000 [
append b2 to-decimal j append/only b2 reduce [j]
]
h1: to-hash b1
h2: to-hash b2
t1: clk loop 1000 [find b1 10000] t1: clk - t1
t2: clk loop 1000 [find b2 10000.0] t2: clk - t2
print ["Find integer key in block:" t1]
print ["Find decimal key in block:" t2]
t1: clk loop 1000 [find h1 10000] t1: clk - t1
t2: clk loop 1000 [find h2 10000.0] t2: clk - t2
print ["Find integer key in hash:" t1]
print ["Find decimal key in hash:" t2]
;end of code
Here are my results for the current versions of core and view (450 MHz PII):
system/version 2.5.0.3.1
Find integer key in block: 0:00:02.04
Find decimal key in block: 0:00:13.4
Find integer key in hash: 0:00
Find decimal key in hash: 0:00:14.66
system/version 1.2.1.3.1
Find integer key in block: 0:00:01.92
Find decimal key in block: 0:00:15.27
Find integer key in hash: 0:00
Find decimal key in hash: 0:00:16.59
It is interesting that the search for a decimal key takes 6 to 8 times
longer than for an integer key.
I noticed that if I change the code so that the decimal search keys are of
the form j / 3 instead of to-decimal j, and then search for 10000 / 3 the
times are much better, especially for the hash:
Find integer key in block: 0:00:02.03
Find decimal key in block: 0:00:09.23
Find integer key in hash: 0:00
Find decimal key in hash: 0:00:04.78
This suggests that comparing decimals which are equal to integers (1000 and
1000.0) requires some extra time. But this does not seem to be confirmed by
a direct test:
>> a: 10000
== 10000
>> b: 10000.0
== 10000
>> c: 10000 / 3
== 3333.33333333333
>> t: clk loop 5'000'000 [equal? a a] clk - t
== 0:00:06.26
>> t: clk loop 5'000'000 [equal? b b] clk - t
== 0:00:09.89
>> t: clk loop 5'000'000 [equal? c c] clk - t
== 0:00:10.11
>From the original example with decimal keys equal to integers, it appears
that using hash with decimal keys actually takes more time for FIND than
when using a block. But with integer keys the hash makes FIND so fast that
the loop iterations have to go up by a factor of 100 inorder to get a
measurable time.
I also see this:
>> t: clk loop 1000 [find h2 10000.0] clk - t
== 0:00:14.01
>> t: clk loop 1000 [find b2 10000.0] clk - t
== 0:00:14.83
>> t: clk loop 1000 [find h2 5000.0] clk - t
== 0:00:14.11
>> t: clk loop 1000 [find b2 5000.0] clk - t
== 0:00:07.36 ;twice as fast as hash
It does seem that the hashes with decimal keys take a time independent of
the location of the search key in the block while FIND on a block searches
linearly from the beginning and thus results in half the time to find a key
half-way through the block. Constant time is certainly a hallmark of a hash,
but the hash time for FIND is so slow for decimal keys equal to integers
that searching for a random key will take on average almost twice as long as
when using a block. For the more representative j / 3 case, the times for
block and hash will be comparable for a random search key, while the worst
case will be a factor of 2 better for the hash.
I guess I don't fully understand REBOL hashes, particularly with respect to
decimal keys. Of the datatypes you mention, which will fail to display the
large speed increase we expect from a hash (and which we see for string and
integer keys) as opposed to a block ?
Any comments are welcome.
-Larry
[15/16] from: dlhawley:attbi at: 8-Jan-2002 22:02
Thanks Joel and others. I modified Joel's code (below).
changes:
data: make hash![] -- also needed in clear
append only appends one value - so I append the key, then the value
Also I added a simple to/from-file persistance. Does anyone know
whyREBOL
has the asymmetry in in read/write? - fortunatly reduce fixes things
up nicely.
Here's what I'm looking at now:
assoc: make object!
[ data: make hash! []
default: none ; return for no key match
file: none ; to/from-file
clear: does [ data: make hash! [] ]
store: func [ k v /loc pos]
[ pos: find/skip data k 2
either found? pos
[ change next pos v]
[ append data k
append/only data v
]
]
fetch: func [ k /local r]
[ either found? r: select/skip data k 2 [first r] [default]]
to-file: does [ write file mold data ]
from-file: func [][ data: reduce do read file ]
]
assoc-test: func []
[
hd: make assoc [ file: %test.rdb ]
hd/store "ob1" [ 1 2 3]
hd/store "ob2" [ "text data" ]
hd/store "ob3" make object! [ f1: "text data" f2: 24 ]
print mold hd/fetch "ob1"
hd/to-file
hd/from-file
print mold hd/fetch "ob1"
print mold hd/fetch "ob2"
print mold hd/fetch "ob3"
]
>> assoc-test
[1 2 3]
[1 2 3]
["text data"]
make object! [
f1: "text data"
f2: 24
]
[16/16] from: arolls:idatam:au at: 10-Jan-2002 15:05
Didn't read the thread, but possibly
you want read/binary & write/binary ?
Or save and load.
Anton.
Notes
- Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted