Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

Edit Distance Function?

 [1/6] from: garymiller:starband at: 17-Jul-2002 15:08


Has anyone out there written an edit distance function in Rebol they would be willing to share. Edit Distance returns an integer when passed two string that counts the number on insertions, deletions or substitution necessary to turn string1 into string2. The algorithm is documented pretty well on the web but since it's recursive and is always described in terms of mathematical dynamic programming, it's a nontrivial exercise, at least for me! For example. string1: doog string2: dog edit distance would be 1 string1: gaot string2: goat edit distance would be 2 These algorithm are usually used along with soundex in spelling checkers and the like. _____________________________________ Think you can't get high-speed Internet? Now you can! StarBand offers high-speed Internet via satellite. No phone lines needed. New lower prices. For all 50 U.S. States. See for yourself at: http://www.StarBand.com

 [2/6] from: greggirwin:mindspring at: 17-Jul-2002 14:44


Hi Gary, << Has anyone out there written an edit distance function in Rebol they would be willing to share. >> I just hacked this up, and tested it only very lightly. You can use it, in trade for testing it for me. :) I could write a much better version given some time, but going off the references I found, this came together quickly. It's going to be slow though, and has high space complexity. A better version will take a little more time, which I don't have much of at the moment. If you can verify that this works correctly, then I can use it as a reference for future development of an improved version. edit-distance: func [s1 s2 /local x y z] [ c: array/initial reduce [1 + length? s1 1 + length? s2] 0 x: y: z: 0 repeat i 1 + length? s1 [c/:i/1: i] repeat j 1 + length? s2 [poke c/1 j j] for i 2 1 + length? s1 1 [ for j 2 1 + length? s2 1 [ x: 1 + pick pick c i - 1 j y: 1 + pick pick c i j - 1 either s1/:i = s2/:j [ z: pick pick c i - 1 j - 1 ][ z: 1 + pick pick c i - 1 j - 1 ] poke c/:i j min min x y z ] ] subtract last last c 1 ] edit-distance "Gregg" "Greg" edit-distance "Gregg" "Gary" edit-distance "Gregg" "Hirschberg" --Gregg

 [3/6] from: garymiller:starband at: 19-Jul-2002 11:39


> I just hacked this up, and tested it only very lightly. > You can use it, in trade for testing it for me. :)
<<quoted lines omitted: 6>>
> as a reference for future development of an improved > version.
Greg, fantastic job on the edit-distance function! And so quickly. I was amazed! Support on this mailing list is nothing short of phenomenal! I am including my test script and results. The program appears to work great with everything I threw at it! I threw around 40 strings at it some quite large and didn't exceed a second of run time on a 1Ghz Dell Pentium. I have an ascii text dictionary with 100,000 words that I'm going to use edit-distance with to find the closest words to any misspelled words in the user's input sentence. Thanks again Greg, and if any optimizations come to you on the speed I would much appreciate hearing from you! etest: func [arg1 arg2] [ print [arg1 " / " arg2 " / " (edit-distance arg1 arg2)]] print now/time etest "a" "b" etest "am" "at" etest "i" "I" etest "Gregg" "Greg" etest "Gregg" "Gary" etest "Gregg" "Hirschberg" etest "Do you enjoy fishing" "Do you like to fish" etest "Do you enjoy fishing" "Do you like to hunt" etest "how are you" "how are you today" etest "how have you been" "how've you been" etest "how much was it" "how much did it cost" etest "how much was it" "how much did you pay for it" etest "how old are you" "how old're you now" etest "I'm broke" "I'm busted" etest "I'm broke" "I'm out of money" etest "I'm trying to quit" "I am trying to quit!" etest "I'm trying to quit" "I'm attempting to quit" etest "I'm trying to quit" "I'd like to quit" etest "I'm trying to quit" "I want to stop" etest "Let's take a break" "Let's take a breather" etest "Let's take a walk" "Let's go for a walk" etest "That's good!" "That's bad!" etest "That's good!" "That fantastic" etest "That's stupid" "That sucks!" etest "the" "teh" etest "what're you having for lunch today" "what are you eating for lunch" etest "what're you having for lunch today" "what're you going to have for lunch today" etest "what day is it" "hat day is it" etest "what is you name" "what's your name" etest "what time is it" "what is the time" etest "what time is it" "what time is it now" etest "what time is it" "whattimeisit" etest "what time is it" "what time was it" etest "where do you live?" "Where do yo reside" etest "you're crazy" "you are crazy" etest "you're crazy" "you must be crazy" print now/time 11:29:13 a / b / 0 am / at / 1 i / I / 0 Gregg / Greg / 1 Gregg / Gary / 4 Gregg / Hirschberg / 6 Do you enjoy fishing / Do you like to fish / 9 Do you enjoy fishing / Do you like to hunt / 10 how are you / how are you today / 6 how have you been / how've you been / 3 how much was it / how much did it cost / 8 how much was it / how much did you pay for it / 14 how old are you / how old're you now / 6 I'm broke / I'm busted / 4 I'm broke / I'm out of money / 10 I'm trying to quit / I am trying to quit! / 3 I'm trying to quit / I'm attempting to quit / 6 I'm trying to quit / I'd like to quit / 6 I'm trying to quit / I want to stop / 11 Let's take a break / Let's take a breather / 4 Let's take a walk / Let's go for a walk / 6 That's good! / That's bad! / 3 That's good! / That fantastic / 10 That's stupid / That sucks! / 7 the / teh / 2 what're you having for lunch today / what are you eating for lunch / 10 what're you having for lunch today / what're you going to have for lunch today / 11 what day is it / hat day is it / 1 what is you name / what's your name / 3 what time is it / what is the time / 9 what time is it / what time is it now / 4 what time is it / whattimeisit / 3 what time is it / what time was it / 2 where do you live? / Where do yo reside / 6 you're crazy / you are crazy / 2 you're crazy / you must be crazy / 7 11:29:13 11:29:13 >> _____________________________________ Think you can't get high-speed Internet? Now you can! StarBand offers high-speed Internet via satellite. No phone lines needed. New lower prices. For all 50 U.S. States. See for yourself at: http://www.StarBand.com

 [4/6] from: greggirwin:mindspring at: 19-Jul-2002 10:41


Thanks for testing it Gary! Glad to hear it worked OK. I know Volker has done some spell checking stuff in REBOL, so maybe I'll pass it along to him as well. (Though he's on this list as well so maybe he's already seen it :) --Gregg

 [5/6] from: greggirwin:mindspring at: 19-Jul-2002 10:48


Hi Gary, ACK! Responded too quickly. It looks like it's buggy. The result for "a" "b" should be 1, not 0, right? I'll have to look at it again. I'm pretty sure I know what the problem is, but I've got a deadline coming up, so send me a direct reminder next week if I don't post a new version. --Gregg

 [6/6] from: greggirwin:mindspring at: 24-Jul-2002 14:30


Hi Gary, This version should work correctly, but I don't like the solution. The block of count values is offset, by -1, from the index of the elements you're comparing. Since REBOL is good for humans, and doesn't use 0 to n-1 series indexes, I started checking the series at element 2, so the first char was skipped. By adding a dummy element to the head of each series, we can trick it. Not a great solution, but until I have more time... --Gregg REBOL [] edit-distance: func [s1 s2 /local x y z] [ ; Insert matching dummy elements at the head of each series s1: head insert copy s1 " " s2: head insert copy s2 " " c: array/initial reduce [1 + length? s1 1 + length? s2] 0 x: y: z: 0 repeat i 1 + length? s1 [ c/:i/1: i ] repeat j 1 + length? s2 [ poke c/1 j j ] for i 2 1 + length? s1 1 [ for j 2 1 + length? s2 1 [ x: 1 + pick pick c i - 1 j y: 1 + pick pick c i j - 1 either s1/:i = s2/:j [ z: pick pick c i - 1 j - 1 ][ z: 1 + pick pick c i - 1 j - 1 ] poke c/:i j min min x y z ] ] subtract last last c 1 ] if not system/script/args [ ;edit-distance "Gregg" "Greg" ;edit-distance "Gregg" "Gary" ;edit-distance "Gregg" "Hirschberg" ;edit-distance "abcdefghijkl" "bcdeffghixkl" etest: func [arg1 arg2] [ print [arg1 " / " arg2 " / " (edit-distance arg1 arg2)] ] print now/time etest "a" "b" etest "am" "at" etest "I" "I" etest "i" "I" etest "Gregg" "Greg" etest "Gregg" "Gary" etest "Gregg" "Hirschberg" etest "Do you enjoy fishing" "Do you like to fish" etest "Do you enjoy fishing" "Do you like to hunt" etest "how are you" "how are you today" etest "how have you been" "how've you been" etest "how much was it" "how much did it cost" etest "how much was it" "how much did you pay for it" etest "how old are you" "how old're you now" etest "I'm broke" "I'm busted" etest "I'm broke" "I'm out of money" etest "I'm trying to quit" "I am trying to quit!" etest "I'm trying to quit" "I'm attempting to quit" etest "I'm trying to quit" "I'd like to quit" etest "I'm trying to quit" "I want to stop" etest "Let's take a break" "Let's take a breather" etest "Let's take a walk" "Let's go for a walk" etest "That's good!" "That's bad!" etest "That's good!" "That fantastic" etest "That's stupid" "That sucks!" etest "the" "teh" etest "what're you having for lunch today" "what are you eating for lunch" etest "what're you having for lunch today" "what're you going to have for lunch today" etest "what day is it" "hat day is it" etest "what is you name" "what's your name" etest "what time is it" "what is the time" etest "what time is it" "what time is it now" etest "what time is it" "whattimeisit" etest "what time is it" "what time was it" etest "where do you live?" "Where do yo reside" etest "you're crazy" "you are crazy" etest "you're crazy" "you must be crazy" etest "abcdefghijkl" "bcdeffghixkl" print now/time halt ]

Notes
  • Quoted lines have been omitted from some messages.
    View the message alone to see the lines that have been omitted