## [REBOL] Re: Function to compare two scripts?

### From: greggirwin:mindspring at: 21-Aug-2003 11:40

Hi Sunanda, Sac> I'd like to be able to compare two scripts (or any pair of newline-separated Sac> text items) and get a block of the differences. Volker and I have each tackled this, but in different ways. I'm not sure where his latest stuff is (Volker?), mine is included below. His has a GUI front end; mine is just code-based. Got it working and it's collected dust since then. Not polished, lots of "idea" stuff in there cluttering things up, but maybe something you can use. -- Gregg Basic use of DIFF engine. (No, these are not to be considered "real" docs. :) 1) Load %rdiff-0.r, which uses %lcs-0.r do %rdiff-0.r 2) Run diff/create against two series values to find the differences between them. a: "empty_bottle" b: "nematode_knowledge" diff-blk: diff/create a b diff-blk would now contain this: [ remove/part at result 1 1 change/part at result 3 ["p"] 1 insert at result 5 ["y"] remove/part at result 6 3 insert at result 7 ["b"] remove/part at result 8 2 insert at result 9 ["tt"] remove/part at result 11 1 append clear at result 13 [""] ] 3) The result of diff/create is a block of REBOL commands that can be DOne against the second series (b) you passed to diff/create to turn it into the first series (a). I.e. it's a patch you can DO to apply it. aa: diff/apply copy b diff-blk REBOL [ Title: "lcs-0" File: %lcs-0.r Author: "Gregg Irwin" EMail:[gregg--pointillistic--com]Version: 0.0.1 Copyright: "Copyright © 2002 Pointillistic Software. All Rights Reserved." history: [ 0.0.1 [ {Initial test version.} ] ] comment: { Provides longest-common-subsequence differencing mechanism. This version (level 0) uses a simple LCS matrix, which is *very* inefficient in its use of space (high space complexity). It takes about 4.5 second on my P900 to spin through two identical files of ~700 lines (~24K) and find their LCS's. In addition to the base build-matrix function, there are a number of other routines, which are highly redundant and are here for testing, to see how I think this thing should work in practice. Once you have an LCS matrix built, you can convert it to a list of pairs or offsets and then find the actual subsequence. This is all handled by the functions here. } ] lcs: make object! [ build-matrix: func [ {Returns a matrix of values that can be "walked" to find the longest common sub-sequence of the two series' specified. It is horribly space inefficient, and pretty slow to boot, for large series'.} a [series!] "First series" b [series!] "Second series" ;/notify ; notify-fn [function!] /local m n result r-i a-i i_+_1 ii ][ ; Should we work on copies of the series, rather than the originals? ; It means more overhead, but what if someone modifies them while ; we're working on them? E.g. we've stored their original lengths, ; and are looping on those values. What if they shrink? i: m: length? a j: n: length? b result: array/initial reduce [m + 1 n + 1] 0 ; We shouldn't need the zero checks for our i/j counters, but ; they're nice to have around for peace of mind. They add 10% ; or so to our runtime though. loop m [ ; Cache these values so we don't have to do them ; in the inner loop r-i: result/:i a-i: a/:i i_+_1: i + 1 ;if i = 0 [return result] j: n loop n [ ;if j = 0 [return result] either a-i = b/:j [ poke r-i j (1 + (pick pick result i_+_1 j + 1)) ; notify-fn i j ][ poke r-i j (max (pick pick result i_+_1 j) (pick pick result i j + 1) ) ] j: j - 1 ] i: i - 1 ; Will this make us more system friendly, or is it dangerous? wait 0 ] result ] ; Should we have a common guy with refinements here? The routines ; are nearly identical, but if we added a number of other refinements ; the single routine could get ugly. matrix-as: func [ {Take a matrix built by lcs-matrix and find the offsets for the matching items in each series, returned as pair values or offsets, depending on which refinement you use.} a [series!] "First series" b [series!] "Second series" L "Matrix of values from lcs-matrix" /pairs {Return a block of pair values} /offsets {Return a block with two sub-blocks identified as 'a and 'b} /local i j m n result ][ if all [(not pairs) (not offsets)] [ print "You must specify either /pairs or /offsets for matrix-as" halt ] m: length? a n: length? b either pairs [ result: make block! min m n ][ result: copy [] append result reduce ['a make block! m] append result reduce ['b make block! n] ] i: j: 1 while [all [(i <= m) (j <= n)]] [ either (a/:i = b/:j) [ either pairs [ append result to-pair reduce [i j] ][ append result/a i append result/b j ] i: i + 1 j: j + 1 ][ either (pick pick L i + 1 j) >= (pick pick L i j + 1) [i: i + 1] [j: j + 1] ] ] result ] matrix-as-pairs: func [ {Take a matrix built by lcs-matrix and find the offsets for the matching items in each series, returned as pair values.} a [series!] "First series" b [series!] "Second series" L "Matrix of values from lcs-matrix" /local i j m n result ][ m: length? a n: length? b result: make block! min m n i: j: 1 while [all [(i <= m) (j <= n)]] [ either (a/:i = b/:j) [ append result to-pair reduce [i j] i: i + 1 j: j + 1 ][ either (pick pick L i + 1 j) >= (pick pick L i j + 1) [i: i + 1] [j: j + 1] ] ] result ] matrix-as-offsets: func [ {Take a matrix built by lcs-matrix and find the offsets for the matching items in both series as two blocks identified as 'a and 'b.} a [series!] "First series" b [series!] "Second series" L "Matrix of values from lcs-matrix" /local i j m n result ][ m: length? a n: length? b result: copy [] append result reduce ['a make block! m] append result reduce ['b make block! n] i: j: 1 while [all [(i <= m) (j <= n)]] [ either (a/:i = b/:j) [ append result/a i append result/b j i: i + 1 j: j + 1 ][ either (pick pick L i + 1 j) >= (pick pick L i j + 1) [i: i + 1] [j: j + 1] ] ] result ] get-from-matrix: func [ {Take a matrix built by lcs-matrix and return the longest common subsequence as a series of the same type as 'a.} a [series!] "First series" b [series!] "Second series" L "Matrix of values from lcs-matrix" /local i j m n result ][ m: length? a n: length? b result: make a min m n i: j: 1 while [all [(i <= m) (j <= n)]] [ either (a/:i = b/:j) [ append result a/:i i: i + 1 j: j + 1 ][ either (pick pick L i + 1 j) >= (pick pick L i j + 1) [i: i + 1] [j: j + 1] ] ] result ] get-from-pairs: func [ {Take a block built by get-lcs-pairs and return the longest common subsequence as a series of the same type as 'a.} a [series!] "First series" b [series!] "Second series" block [any-block!] "Pair values for the LCS offsets" /use-b "Walk series b instead of series a, which is the default." /local pair series value result ][ set [series value] reduce either use-b [[:b 2]][[:a 1]] result: make series (length? series) foreach pair block [ append result pick series pair/:value ] result ] get-from-offsets: func [ {Take a block built by get-lcs-pairs and return the longest common subsequence as a series of the same type as 'a.} a [series!] "First series" b [series!] "Second series" block [any-block!] "Block containing offsets from get-lcs-offsets." /use-b "Walk series b instead of series a, which is the default." /local pair series offsets result ][ set [series offsets] reduce either use-b [[:b block/b]][[:a block/a]] result: make series (length? series) foreach offset offsets [ append result pick series offset ] result ] ] REBOL [ Title: "rdiff-0" File: %rdiff-0.r Author: "Gregg Irwin" EMail:[gregg--pointillistic--com]Version: 0.0.1 Copyright: "Copyright © 2002 Pointillistic Software. All Rights Reserved." history: [ 0.0.1 [ {Initial test version.} ] ] ] do %lcs-0.r diff: make object! [ build-change-command: func [ {Returns a diff CHANGE command that can be DOne to apply it.} series [series!] origin [pair!] diff [pair!] ][ compose/deep [ change/part at result (origin/1 + 1) [(copy/part at series (origin/1 + 1) (diff/2 - 1))] (diff/2 - 1) ] ] build-insert-command: func [ {Returns a diff INSERT command that can be DOne to apply it.} series [series!] origin [pair!] diff [pair!] ][ compose/deep [ insert at result (origin/1 + 1) [(copy/part at series (origin/1 + 1) (diff/1 - 1))] ] ] build-remove-command: func [ {Returns a diff REMOVE command that can be DOne to apply it.} origin [pair!] diff [pair!] ][ compose/deep [ remove/part at result ; OK, now this guy is kind of a special case. We check to see ; if (diff-from-last/1 > 1) which would mean that we just did ; an insert. If so, add the length of that insert to the offset ; where we want to do our remove, otherwise, add 0. (origin/1 + 1 + either diff/1 > 1 [diff/1 - 1] [0]) (diff/2 - 1) ] ] create: func [ {The diff we create can be used to create 'a from 'b just by DOing it.} a [series!] b [series!] /local xy last-pair diff-from-last result ][ if strict-equal? a b [return copy [result]] xy: lcs/matrix-as-pairs a b lcs/build-matrix a b result: make block! length? xy if empty? xy [return result] last-pair: 0x0 foreach pair xy [ diff-from-last: pair - last-pair ;print ["pair" pair "last" last-pair "diff" diff-from-last] either diff-from-last/1 = diff-from-last/2 [ if diff-from-last <> 1x1 [ ;print [tab "diff/1 = diff/2" "CHANGE" "diff=" diff-from-last "last=" last-pair] append result build-change-command a last-pair diff-from-last ] ][ if diff-from-last/1 > 1 [ ;print [tab "diff/1 > diff/2" "INSERT" "diff=" diff-from-last "last=" last-pair] append result build-insert-command a last-pair diff-from-last ] if diff-from-last/2 > 1 [ ;print [tab "diff/1 < diff/2" "REMOVE" "diff=" diff-from-last "last=" last-pair] append result build-remove-command last-pair diff-from-last ] ] last-pair: pair first-pair: false ] ; Now we've looped through all the internal items but we still ; have to add and remove trailing items from the last pair out. last-pair: last xy append result compose/deep [ append clear at result (last-pair/1 + 1) [(copy at a (last-pair/1 + 1))] ] clear xy result ] apply: func [ {Applies a set of commands, built by CREATE, to the data series and returns the result.} data [series!] {The data to which you want to apply the diff patch.} diff-blk [any-block!] {The block of change commands you want applied to the data. Generally this is the result of a diff/create call.} /local result ][ ; If we can make our copy of the data a list!, the insert and remove ; ops should be much more efficient. Gotta figure out a general ; solution though. ; make list! hoses us up here. result: copy data ;either any-block? data [make list! data][copy data] do bind/copy diff-blk 'result ;head result result ] ]