Edit Distance Function?
[1/6] from: garymiller:starband at: 17-Jul-2002 15:08
Has anyone out there written an edit distance function in
Rebol they would be willing to share.
Edit Distance returns an integer when passed two string
that counts the number on insertions, deletions or
substitution necessary to turn string1 into string2.
The algorithm is documented pretty well on the web but
since it's recursive and is always described in terms of
mathematical dynamic programming, it's a nontrivial
exercise, at least for me!
For example.
string1: doog
string2: dog
edit distance would be 1
string1: gaot
string2: goat
edit distance would be 2
These algorithm are usually used along with soundex in
spelling checkers and the like.
_____________________________________
Think you can't get high-speed Internet? Now you can!
StarBand offers high-speed Internet via satellite.
No phone lines needed. New lower prices. For all 50 U.S. States. See for yourself at:
http://www.StarBand.com
[2/6] from: greggirwin:mindspring at: 17-Jul-2002 14:44
Hi Gary,
<< Has anyone out there written an edit distance function in Rebol they
would be willing to share. >>
I just hacked this up, and tested it only very lightly. You can use it, in
trade for testing it for me. :) I could write a much better version given
some time, but going off the references I found, this came together quickly.
It's going to be slow though, and has high space complexity. A better
version will take a little more time, which I don't have much of at the
moment. If you can verify that this works correctly, then I can use it as a
reference for future development of an improved version.
edit-distance: func [s1 s2 /local x y z] [
c: array/initial reduce [1 + length? s1 1 + length? s2] 0
x: y: z: 0
repeat i 1 + length? s1 [c/:i/1: i]
repeat j 1 + length? s2 [poke c/1 j j]
for i 2 1 + length? s1 1 [
for j 2 1 + length? s2 1 [
x: 1 + pick pick c i - 1 j
y: 1 + pick pick c i j - 1
either s1/:i = s2/:j [
z: pick pick c i - 1 j - 1
][
z: 1 + pick pick c i - 1 j - 1
]
poke c/:i j min min x y z
]
]
subtract last last c 1
]
edit-distance "Gregg" "Greg"
edit-distance "Gregg" "Gary"
edit-distance "Gregg" "Hirschberg"
--Gregg
[3/6] from: garymiller:starband at: 19-Jul-2002 11:39
> I just hacked this up, and tested it only very lightly.
> You can use it, in trade for testing it for me. :)
<<quoted lines omitted: 6>>
> as a reference for future development of an improved
> version.
Greg, fantastic job on the edit-distance function! And so
quickly. I was amazed!
Support on this mailing list is nothing short of phenomenal!
I am including my test script and results. The program
appears to work great with everything I threw at it! I
threw around 40 strings at it some quite large and didn't
exceed a second of run time on a 1Ghz Dell Pentium.
I have an ascii text dictionary with 100,000 words that I'm
going to use edit-distance with to find the closest words
to any misspelled words in the user's input sentence.
Thanks again Greg, and if any optimizations come to you on
the speed I would much appreciate hearing from you!
etest: func [arg1 arg2] [ print [arg1 " / " arg2 " / "
(edit-distance arg1 arg2)]]
print now/time
etest "a" "b"
etest "am" "at"
etest "i" "I"
etest "Gregg" "Greg"
etest "Gregg" "Gary"
etest "Gregg" "Hirschberg"
etest "Do you enjoy fishing" "Do you like to fish"
etest "Do you enjoy fishing" "Do you like to hunt"
etest "how are you" "how are you today"
etest "how have you been" "how've you been"
etest "how much was it" "how much did it cost"
etest "how much was it" "how much did you pay for it"
etest "how old are you" "how old're you now"
etest "I'm broke" "I'm busted"
etest "I'm broke" "I'm out of money"
etest "I'm trying to quit" "I am trying to quit!"
etest "I'm trying to quit" "I'm attempting to quit"
etest "I'm trying to quit" "I'd like to quit"
etest "I'm trying to quit" "I want to stop"
etest "Let's take a break" "Let's take a breather"
etest "Let's take a walk" "Let's go for a walk"
etest "That's good!" "That's bad!"
etest "That's good!" "That fantastic"
etest "That's stupid" "That sucks!"
etest "the" "teh"
etest "what're you having for lunch today" "what are you
eating for lunch"
etest "what're you having for lunch today" "what're you
going to have for lunch today"
etest "what day is it" "hat day is it"
etest "what is you name" "what's your name"
etest "what time is it" "what is the time"
etest "what time is it" "what time is it now"
etest "what time is it" "whattimeisit"
etest "what time is it" "what time was it"
etest "where do you live?" "Where do yo reside"
etest "you're crazy" "you are crazy"
etest "you're crazy" "you must be crazy"
print now/time
11:29:13
a / b / 0
am / at / 1
i / I / 0
Gregg / Greg / 1
Gregg / Gary / 4
Gregg / Hirschberg / 6
Do you enjoy fishing / Do you like to fish / 9
Do you enjoy fishing / Do you like to hunt / 10
how are you / how are you today / 6
how have you been / how've you been / 3
how much was it / how much did it cost / 8
how much was it / how much did you pay for it / 14
how old are you / how old're you now / 6
I'm broke / I'm busted / 4
I'm broke / I'm out of money / 10
I'm trying to quit / I am trying to quit! / 3
I'm trying to quit / I'm attempting to quit / 6
I'm trying to quit / I'd like to quit / 6
I'm trying to quit / I want to stop / 11
Let's take a break / Let's take a breather / 4
Let's take a walk / Let's go for a walk / 6
That's good! / That's bad! / 3
That's good! / That fantastic / 10
That's stupid / That sucks! / 7
the / teh / 2
what're you having for lunch today / what are you eating
for lunch / 10
what're you having for lunch today / what're you going to
have for lunch today / 11
what day is it / hat day is it / 1
what is you name / what's your name / 3
what time is it / what is the time / 9
what time is it / what time is it now / 4
what time is it / whattimeisit / 3
what time is it / what time was it / 2
where do you live? / Where do yo reside / 6
you're crazy / you are crazy / 2
you're crazy / you must be crazy / 7
11:29:13
11:29:13 >>
_____________________________________
Think you can't get high-speed Internet? Now you can!
StarBand offers high-speed Internet via satellite.
No phone lines needed. New lower prices. For all 50 U.S. States. See for yourself at:
http://www.StarBand.com
[4/6] from: greggirwin:mindspring at: 19-Jul-2002 10:41
Thanks for testing it Gary!
Glad to hear it worked OK. I know Volker has done some spell checking stuff
in REBOL, so maybe I'll pass it along to him as well. (Though he's on this
list as well so maybe he's already seen it :)
--Gregg
[5/6] from: greggirwin:mindspring at: 19-Jul-2002 10:48
Hi Gary,
ACK! Responded too quickly. It looks like it's buggy. The result for "a" "b"
should be 1, not 0, right? I'll have to look at it again. I'm pretty sure I
know what the problem is, but I've got a deadline coming up, so send me a
direct reminder next week if I don't post a new version.
--Gregg
[6/6] from: greggirwin:mindspring at: 24-Jul-2002 14:30
Hi Gary,
This version should work correctly, but I don't like the solution. The block
of count values is offset, by -1, from the index of the elements you're
comparing. Since REBOL is good for humans, and doesn't use 0 to n-1 series
indexes, I started checking the series at element 2, so the first char was
skipped. By adding a dummy element to the head of each series, we can trick
it. Not a great solution, but until I have more time...
--Gregg
REBOL []
edit-distance: func [s1 s2 /local x y z] [
; Insert matching dummy elements at the head of each series
s1: head insert copy s1 " "
s2: head insert copy s2 " "
c: array/initial reduce [1 + length? s1 1 + length? s2] 0
x: y: z: 0
repeat i 1 + length? s1 [
c/:i/1: i
]
repeat j 1 + length? s2 [
poke c/1 j j
]
for i 2 1 + length? s1 1 [
for j 2 1 + length? s2 1 [
x: 1 + pick pick c i - 1 j
y: 1 + pick pick c i j - 1
either s1/:i = s2/:j [
z: pick pick c i - 1 j - 1
][
z: 1 + pick pick c i - 1 j - 1
]
poke c/:i j min min x y z
]
]
subtract last last c 1
]
if not system/script/args [
;edit-distance "Gregg" "Greg"
;edit-distance "Gregg" "Gary"
;edit-distance "Gregg" "Hirschberg"
;edit-distance "abcdefghijkl" "bcdeffghixkl"
etest: func [arg1 arg2] [
print [arg1 " / " arg2 " / " (edit-distance arg1 arg2)]
]
print now/time
etest "a" "b"
etest "am" "at"
etest "I" "I"
etest "i" "I"
etest "Gregg" "Greg"
etest "Gregg" "Gary"
etest "Gregg" "Hirschberg"
etest "Do you enjoy fishing" "Do you like to fish"
etest "Do you enjoy fishing" "Do you like to hunt"
etest "how are you" "how are you today"
etest "how have you been" "how've you been"
etest "how much was it" "how much did it cost"
etest "how much was it" "how much did you pay for it"
etest "how old are you" "how old're you now"
etest "I'm broke" "I'm busted"
etest "I'm broke" "I'm out of money"
etest "I'm trying to quit" "I am trying to quit!"
etest "I'm trying to quit" "I'm attempting to quit"
etest "I'm trying to quit" "I'd like to quit"
etest "I'm trying to quit" "I want to stop"
etest "Let's take a break" "Let's take a breather"
etest "Let's take a walk" "Let's go for a walk"
etest "That's good!" "That's bad!"
etest "That's good!" "That fantastic"
etest "That's stupid" "That sucks!"
etest "the" "teh"
etest "what're you having for lunch today" "what are you eating for
lunch"
etest "what're you having for lunch today" "what're you going to have
for lunch today"
etest "what day is it" "hat day is it"
etest "what is you name" "what's your name"
etest "what time is it" "what is the time"
etest "what time is it" "what time is it now"
etest "what time is it" "whattimeisit"
etest "what time is it" "what time was it"
etest "where do you live?" "Where do yo reside"
etest "you're crazy" "you are crazy"
etest "you're crazy" "you must be crazy"
etest "abcdefghijkl" "bcdeffghixkl"
print now/time
halt
]
Notes
- Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted