[REBOL] Re: Function to compare two scripts?
From: greggirwin:mindspring at: 21-Aug-2003 11:40
Hi Sunanda,
Sac> I'd like to be able to compare two scripts (or any pair of newline-separated
Sac> text items) and get a block of the differences.
Volker and I have each tackled this, but in different ways. I'm not
sure where his latest stuff is (Volker?), mine is included below. His
has a GUI front end; mine is just code-based. Got it working and it's
collected dust since then. Not polished, lots of "idea" stuff in there
cluttering things up, but maybe something you can use.
-- Gregg
Basic use of DIFF engine. (No, these are not to be considered "real"
docs. :)
1) Load %rdiff-0.r, which uses %lcs-0.r
do %rdiff-0.r
2) Run diff/create against two series values to find the differences
between them.
a: "empty_bottle"
b: "nematode_knowledge"
diff-blk: diff/create a b
diff-blk would now contain this:
[
remove/part at result 1 1
change/part at result 3 ["p"] 1
insert at result 5 ["y"]
remove/part at result 6 3
insert at result 7 ["b"]
remove/part at result 8 2
insert at result 9 ["tt"]
remove/part at result 11 1
append clear at result 13 [""]
]
3) The result of diff/create is a block of REBOL commands that can
be DOne against the second series (b) you passed to diff/create
to turn it into the first series (a). I.e. it's a patch you can
DO to apply it.
aa: diff/apply copy b diff-blk
REBOL [
Title: "lcs-0"
File: %lcs-0.r
Author: "Gregg Irwin"
EMail: [gregg--pointillistic--com]
Version: 0.0.1
Copyright: "Copyright © 2002 Pointillistic Software. All Rights Reserved."
history: [
0.0.1 [
{Initial test version.}
]
]
comment: {
Provides longest-common-subsequence differencing mechanism. This version
(level 0) uses a simple LCS matrix, which is *very* inefficient in its
use of space (high space complexity). It takes about 4.5 second on my P900
to spin through two identical files of ~700 lines (~24K) and find their
LCS's.
In addition to the base build-matrix function, there are a number of other
routines, which are highly redundant and are here for testing, to see how
I think this thing should work in practice.
Once you have an LCS matrix built, you can convert it to a list of pairs
or offsets and then find the actual subsequence. This is all handled by
the functions here.
}
]
lcs: make object! [
build-matrix: func [
{Returns a matrix of values that can be "walked" to find the
longest common sub-sequence of the two series' specified. It
is horribly space inefficient, and pretty slow to boot, for
large series'.}
a [series!] "First series"
b [series!] "Second series"
;/notify
; notify-fn [function!]
/local m n result r-i a-i i_+_1 ii
][
; Should we work on copies of the series, rather than the originals?
; It means more overhead, but what if someone modifies them while
; we're working on them? E.g. we've stored their original lengths,
; and are looping on those values. What if they shrink?
i: m: length? a
j: n: length? b
result: array/initial reduce [m + 1 n + 1] 0
; We shouldn't need the zero checks for our i/j counters, but
; they're nice to have around for peace of mind. They add 10%
; or so to our runtime though.
loop m [
; Cache these values so we don't have to do them
; in the inner loop
r-i: result/:i
a-i: a/:i
i_+_1: i + 1
;if i = 0 [return result]
j: n
loop n [
;if j = 0 [return result]
either a-i = b/:j [
poke r-i j (1 + (pick pick result i_+_1 j + 1))
; notify-fn i j
][
poke r-i j (max
(pick pick result i_+_1 j)
(pick pick result i j + 1)
)
]
j: j - 1
]
i: i - 1
; Will this make us more system friendly, or is it dangerous?
wait 0
]
result
]
; Should we have a common guy with refinements here? The routines
; are nearly identical, but if we added a number of other refinements
; the single routine could get ugly.
matrix-as: func [
{Take a matrix built by lcs-matrix and find the offsets for the
matching items in each series, returned as pair values or offsets,
depending on which refinement you use.}
a [series!] "First series"
b [series!] "Second series"
L "Matrix of values from lcs-matrix"
/pairs {Return a block of pair values}
/offsets {Return a block with two sub-blocks identified as 'a and 'b}
/local i j m n result
][
if all [(not pairs) (not offsets)] [
print "You must specify either /pairs or /offsets for matrix-as"
halt
]
m: length? a
n: length? b
either pairs [
result: make block! min m n
][
result: copy []
append result reduce ['a make block! m]
append result reduce ['b make block! n]
]
i: j: 1
while [all [(i <= m) (j <= n)]] [
either (a/:i = b/:j) [
either pairs [
append result to-pair reduce [i j]
][
append result/a i
append result/b j
]
i: i + 1
j: j + 1
][
either (pick pick L i + 1 j) >= (pick pick L i j + 1)
[i: i + 1]
[j: j + 1]
]
]
result
]
matrix-as-pairs: func [
{Take a matrix built by lcs-matrix and find the offsets for the
matching items in each series, returned as pair values.}
a [series!] "First series"
b [series!] "Second series"
L "Matrix of values from lcs-matrix"
/local i j m n result
][
m: length? a
n: length? b
result: make block! min m n
i: j: 1
while [all [(i <= m) (j <= n)]] [
either (a/:i = b/:j) [
append result to-pair reduce [i j]
i: i + 1
j: j + 1
][
either (pick pick L i + 1 j) >= (pick pick L i j + 1)
[i: i + 1]
[j: j + 1]
]
]
result
]
matrix-as-offsets: func [
{Take a matrix built by lcs-matrix and find the offsets for the
matching items in both series as two blocks identified as 'a and 'b.}
a [series!] "First series"
b [series!] "Second series"
L "Matrix of values from lcs-matrix"
/local i j m n result
][
m: length? a
n: length? b
result: copy []
append result reduce ['a make block! m]
append result reduce ['b make block! n]
i: j: 1
while [all [(i <= m) (j <= n)]] [
either (a/:i = b/:j) [
append result/a i
append result/b j
i: i + 1
j: j + 1
][
either (pick pick L i + 1 j) >= (pick pick L i j + 1)
[i: i + 1]
[j: j + 1]
]
]
result
]
get-from-matrix: func [
{Take a matrix built by lcs-matrix and return the longest common
subsequence as a series of the same type as 'a.}
a [series!] "First series"
b [series!] "Second series"
L "Matrix of values from lcs-matrix"
/local i j m n result
][
m: length? a
n: length? b
result: make a min m n
i: j: 1
while [all [(i <= m) (j <= n)]] [
either (a/:i = b/:j) [
append result a/:i
i: i + 1
j: j + 1
][
either (pick pick L i + 1 j) >= (pick pick L i j + 1)
[i: i + 1]
[j: j + 1]
]
]
result
]
get-from-pairs: func [
{Take a block built by get-lcs-pairs and return the longest common
subsequence as a series of the same type as 'a.}
a [series!] "First series"
b [series!] "Second series"
block [any-block!] "Pair values for the LCS offsets"
/use-b "Walk series b instead of series a, which is the default."
/local pair series value result
][
set [series value] reduce either use-b [[:b 2]][[:a 1]]
result: make series (length? series)
foreach pair block [
append result pick series pair/:value
]
result
]
get-from-offsets: func [
{Take a block built by get-lcs-pairs and return the longest common
subsequence as a series of the same type as 'a.}
a [series!] "First series"
b [series!] "Second series"
block [any-block!] "Block containing offsets from get-lcs-offsets."
/use-b "Walk series b instead of series a, which is the default."
/local pair series offsets result
][
set [series offsets] reduce either use-b [[:b block/b]][[:a block/a]]
result: make series (length? series)
foreach offset offsets [
append result pick series offset
]
result
]
]
REBOL [
Title: "rdiff-0"
File: %rdiff-0.r
Author: "Gregg Irwin"
EMail: [gregg--pointillistic--com]
Version: 0.0.1
Copyright: "Copyright © 2002 Pointillistic Software. All Rights Reserved."
history: [
0.0.1 [
{Initial test version.}
]
]
]
do %lcs-0.r
diff: make object! [
build-change-command: func [
{Returns a diff CHANGE command that can be DOne to apply it.}
series [series!]
origin [pair!]
diff [pair!]
][
compose/deep [
change/part at result (origin/1 + 1)
[(copy/part at series (origin/1 + 1)
(diff/2 - 1))]
(diff/2 - 1)
]
]
build-insert-command: func [
{Returns a diff INSERT command that can be DOne to apply it.}
series [series!]
origin [pair!]
diff [pair!]
][
compose/deep [
insert at result (origin/1 + 1)
[(copy/part at series (origin/1 + 1)
(diff/1 - 1))]
]
]
build-remove-command: func [
{Returns a diff REMOVE command that can be DOne to apply it.}
origin [pair!]
diff [pair!]
][
compose/deep [
remove/part at result
; OK, now this guy is kind of a special case. We check to see
; if (diff-from-last/1 > 1) which would mean that we just did
; an insert. If so, add the length of that insert to the offset
; where we want to do our remove, otherwise, add 0.
(origin/1 + 1 + either diff/1 > 1 [diff/1 - 1] [0])
(diff/2 - 1)
]
]
create: func [
{The diff we create can be used to create 'a from 'b just by DOing it.}
a [series!]
b [series!]
/local xy last-pair diff-from-last result
][
if strict-equal? a b [return copy [result]]
xy: lcs/matrix-as-pairs a b lcs/build-matrix a b
result: make block! length? xy
if empty? xy [return result]
last-pair: 0x0
foreach pair xy [
diff-from-last: pair - last-pair
;print ["pair" pair "last" last-pair "diff" diff-from-last]
either diff-from-last/1 = diff-from-last/2 [
if diff-from-last <> 1x1 [
;print [tab "diff/1 = diff/2" "CHANGE" "diff=" diff-from-last "last=" last-pair]
append result build-change-command a last-pair diff-from-last
]
][
if diff-from-last/1 > 1 [
;print [tab "diff/1 > diff/2" "INSERT" "diff=" diff-from-last "last=" last-pair]
append result build-insert-command a last-pair diff-from-last
]
if diff-from-last/2 > 1 [
;print [tab "diff/1 < diff/2" "REMOVE" "diff=" diff-from-last "last=" last-pair]
append result build-remove-command last-pair diff-from-last
]
]
last-pair: pair
first-pair: false
]
; Now we've looped through all the internal items but we still
; have to add and remove trailing items from the last pair out.
last-pair: last xy
append result compose/deep [
append clear at result (last-pair/1 + 1) [(copy at a (last-pair/1 + 1))]
]
clear xy
result
]
apply: func [
{Applies a set of commands, built by CREATE, to the data series and
returns the result.}
data [series!] {The data to which you want to apply the diff patch.}
diff-blk [any-block!] {The block of change commands you want applied
to the data. Generally this is the result of a diff/create call.}
/local result
][
; If we can make our copy of the data a list!, the insert and remove
; ops should be much more efficient. Gotta figure out a general
; solution though.
; make list! hoses us up here.
result: copy data ;either any-block? data [make list! data][copy data]
do bind/copy diff-blk 'result
;head result
result
]
]