Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

Function to compare two scripts?

 [1/9] from: SunandaDH::aol::com at: 21-Aug-2003 11:49


Hi there, I think something like this has been posted before, but I can't find it. I'd like to be able to compare two scripts (or any pair of newline-separated text items) and get a block of the differences. Can anyone help? As an example of the sort of thing I'm looking for: version-1: { rebol [ name: "fred" title: "test script" ] print "test" haltt} version-2: { REBOL [ name: "fred" title: "test script" history: [21-aug-2003 "Fixed exit problem"] halt} differences: delta-script version-1 version-2 ;; I'd like output something like: foreach diff differences [print diff] line 1 replace "rebol [" "REBOL [" line 4 insert {history: [21-aug-2003 "Fixed exit problem"]} line 6 delete {print "test"} line 7 replace "hallt" "halt" Thanks! Sunanda.

 [2/9] from: tomc:darkwing:uoregon at: 21-Aug-2003 10:22


not that a rebol version isn't a good idea but the diff command already does part of this pretty well and is apt to ba available on most platforms % diff vers1.r vers2.r 1c1 < rebol [ ---
> REBOL [
4c4 < ] ---
> history: [21-aug-2003 "Fixed exit problem"]
6,7c6 < print "test" < haltt ---
> halt
parsing the diff command output and changeing it to rebol friendly blocks of statements may be a shortcut for you On Thu, 21 Aug 2003 [SunandaDH--aol--com] wrote:

 [3/9] from: andrew:martin:colenso:school at: 22-Aug-2003 8:38


Tom wrote:
> Not that a rebol version [of diff] isn't a good idea...
It would be nice if the Rebol version of diff also worked on 'load-ed values as well as plain text. Like this: Diff: func [Version1 [string! file! block!] Version2 [string! file! block!]] [ ; Magic here! :) ] Andrew J Martin Attendance Officer & Information Systems Trouble Shooter Colenso High School Arnold Street, Napier. Tel: 64-6-8310180 ext 826 Fax: 64-6-8336759 http://colenso.net/scripts/Wiki.r?AJM http://www.colenso.school.nz/ DISCLAIMER: Colenso High School and its Board of Trustees is not responsible (or legally liable) for materials distributed to or acquired from user e-mail accounts. You can report any misuse of an e-mail account to our ICT Manager and the complaint will be investigated. (Misuse can come in many forms, but can be viewed as any material sent/received that indicate or suggest pornography, unethical or illegal solicitation, racism, sexism, inappropriate language and/or other issues described in our Acceptable Use Policy.) All outgoing messages are certified virus-free by McAfee GroupShield Exchange 5.10.285.0 Phone: +64 6 843 5095 or Fax: +64 6 833 6759 or E-mail: [postmaster--colenso--school--nz]

 [4/9] from: greggirwin:mindspring at: 21-Aug-2003 16:00


Hi Andrew, (my post containing my diff stuff hasn't appeared yet) AM> It would be nice if the Rebol version of diff also worked on 'load-ed AM> values as well as plain text. That was one of my goals. My DIFF operates on series values, so you should be able to use it at any lexical level. The higher you go, the less items you have to compare, and the faster it will be. I think a handy diff would be to compare objects and functions, maybe in a nested manner, so you can see what elements changed and compare them, rather than just lines. A more meaningful diff if you will. -- Gregg

 [5/9] from: andrew:martin:colenso:school at: 22-Aug-2003 9:59


Hi, Gregg. That sounds really nifty! Can you post what you've got so far, please? Andrew J Martin Attendance Officer & Information Systems Trouble Shooter Colenso High School Arnold Street, Napier. Tel: 64-6-8310180 ext 826 Fax: 64-6-8336759 http://colenso.net/scripts/Wiki.r?AJM http://www.colenso.school.nz/

 [6/9] from: greggirwin:mindspring at: 21-Aug-2003 11:40


Hi Sunanda, Sac> I'd like to be able to compare two scripts (or any pair of newline-separated Sac> text items) and get a block of the differences. Volker and I have each tackled this, but in different ways. I'm not sure where his latest stuff is (Volker?), mine is included below. His has a GUI front end; mine is just code-based. Got it working and it's collected dust since then. Not polished, lots of "idea" stuff in there cluttering things up, but maybe something you can use. -- Gregg Basic use of DIFF engine. (No, these are not to be considered "real" docs. :) 1) Load %rdiff-0.r, which uses %lcs-0.r do %rdiff-0.r 2) Run diff/create against two series values to find the differences between them. a: "empty_bottle" b: "nematode_knowledge" diff-blk: diff/create a b diff-blk would now contain this: [ remove/part at result 1 1 change/part at result 3 ["p"] 1 insert at result 5 ["y"] remove/part at result 6 3 insert at result 7 ["b"] remove/part at result 8 2 insert at result 9 ["tt"] remove/part at result 11 1 append clear at result 13 [""] ] 3) The result of diff/create is a block of REBOL commands that can be DOne against the second series (b) you passed to diff/create to turn it into the first series (a). I.e. it's a patch you can DO to apply it. aa: diff/apply copy b diff-blk REBOL [ Title: "lcs-0" File: %lcs-0.r Author: "Gregg Irwin" EMail: [gregg--pointillistic--com] Version: 0.0.1 Copyright: "Copyright © 2002 Pointillistic Software. All Rights Reserved." history: [ 0.0.1 [ {Initial test version.} ] ] comment: { Provides longest-common-subsequence differencing mechanism. This version (level 0) uses a simple LCS matrix, which is *very* inefficient in its use of space (high space complexity). It takes about 4.5 second on my P900 to spin through two identical files of ~700 lines (~24K) and find their LCS's. In addition to the base build-matrix function, there are a number of other routines, which are highly redundant and are here for testing, to see how I think this thing should work in practice. Once you have an LCS matrix built, you can convert it to a list of pairs or offsets and then find the actual subsequence. This is all handled by the functions here. } ] lcs: make object! [ build-matrix: func [ {Returns a matrix of values that can be "walked" to find the longest common sub-sequence of the two series' specified. It is horribly space inefficient, and pretty slow to boot, for large series'.} a [series!] "First series" b [series!] "Second series" ;/notify ; notify-fn [function!] /local m n result r-i a-i i_+_1 ii ][ ; Should we work on copies of the series, rather than the originals? ; It means more overhead, but what if someone modifies them while ; we're working on them? E.g. we've stored their original lengths, ; and are looping on those values. What if they shrink? i: m: length? a j: n: length? b result: array/initial reduce [m + 1 n + 1] 0 ; We shouldn't need the zero checks for our i/j counters, but ; they're nice to have around for peace of mind. They add 10% ; or so to our runtime though. loop m [ ; Cache these values so we don't have to do them ; in the inner loop r-i: result/:i a-i: a/:i i_+_1: i + 1 ;if i = 0 [return result] j: n loop n [ ;if j = 0 [return result] either a-i = b/:j [ poke r-i j (1 + (pick pick result i_+_1 j + 1)) ; notify-fn i j ][ poke r-i j (max (pick pick result i_+_1 j) (pick pick result i j + 1) ) ] j: j - 1 ] i: i - 1 ; Will this make us more system friendly, or is it dangerous? wait 0 ] result ] ; Should we have a common guy with refinements here? The routines ; are nearly identical, but if we added a number of other refinements ; the single routine could get ugly. matrix-as: func [ {Take a matrix built by lcs-matrix and find the offsets for the matching items in each series, returned as pair values or offsets, depending on which refinement you use.} a [series!] "First series" b [series!] "Second series" L "Matrix of values from lcs-matrix" /pairs {Return a block of pair values} /offsets {Return a block with two sub-blocks identified as 'a and 'b} /local i j m n result ][ if all [(not pairs) (not offsets)] [ print "You must specify either /pairs or /offsets for matrix-as" halt ] m: length? a n: length? b either pairs [ result: make block! min m n ][ result: copy [] append result reduce ['a make block! m] append result reduce ['b make block! n] ] i: j: 1 while [all [(i <= m) (j <= n)]] [ either (a/:i = b/:j) [ either pairs [ append result to-pair reduce [i j] ][ append result/a i append result/b j ] i: i + 1 j: j + 1 ][ either (pick pick L i + 1 j) >= (pick pick L i j + 1) [i: i + 1] [j: j + 1] ] ] result ] matrix-as-pairs: func [ {Take a matrix built by lcs-matrix and find the offsets for the matching items in each series, returned as pair values.} a [series!] "First series" b [series!] "Second series" L "Matrix of values from lcs-matrix" /local i j m n result ][ m: length? a n: length? b result: make block! min m n i: j: 1 while [all [(i <= m) (j <= n)]] [ either (a/:i = b/:j) [ append result to-pair reduce [i j] i: i + 1 j: j + 1 ][ either (pick pick L i + 1 j) >= (pick pick L i j + 1) [i: i + 1] [j: j + 1] ] ] result ] matrix-as-offsets: func [ {Take a matrix built by lcs-matrix and find the offsets for the matching items in both series as two blocks identified as 'a and 'b.} a [series!] "First series" b [series!] "Second series" L "Matrix of values from lcs-matrix" /local i j m n result ][ m: length? a n: length? b result: copy [] append result reduce ['a make block! m] append result reduce ['b make block! n] i: j: 1 while [all [(i <= m) (j <= n)]] [ either (a/:i = b/:j) [ append result/a i append result/b j i: i + 1 j: j + 1 ][ either (pick pick L i + 1 j) >= (pick pick L i j + 1) [i: i + 1] [j: j + 1] ] ] result ] get-from-matrix: func [ {Take a matrix built by lcs-matrix and return the longest common subsequence as a series of the same type as 'a.} a [series!] "First series" b [series!] "Second series" L "Matrix of values from lcs-matrix" /local i j m n result ][ m: length? a n: length? b result: make a min m n i: j: 1 while [all [(i <= m) (j <= n)]] [ either (a/:i = b/:j) [ append result a/:i i: i + 1 j: j + 1 ][ either (pick pick L i + 1 j) >= (pick pick L i j + 1) [i: i + 1] [j: j + 1] ] ] result ] get-from-pairs: func [ {Take a block built by get-lcs-pairs and return the longest common subsequence as a series of the same type as 'a.} a [series!] "First series" b [series!] "Second series" block [any-block!] "Pair values for the LCS offsets" /use-b "Walk series b instead of series a, which is the default." /local pair series value result ][ set [series value] reduce either use-b [[:b 2]][[:a 1]] result: make series (length? series) foreach pair block [ append result pick series pair/:value ] result ] get-from-offsets: func [ {Take a block built by get-lcs-pairs and return the longest common subsequence as a series of the same type as 'a.} a [series!] "First series" b [series!] "Second series" block [any-block!] "Block containing offsets from get-lcs-offsets." /use-b "Walk series b instead of series a, which is the default." /local pair series offsets result ][ set [series offsets] reduce either use-b [[:b block/b]][[:a block/a]] result: make series (length? series) foreach offset offsets [ append result pick series offset ] result ] ] REBOL [ Title: "rdiff-0" File: %rdiff-0.r Author: "Gregg Irwin" EMail: [gregg--pointillistic--com] Version: 0.0.1 Copyright: "Copyright © 2002 Pointillistic Software. All Rights Reserved." history: [ 0.0.1 [ {Initial test version.} ] ] ] do %lcs-0.r diff: make object! [ build-change-command: func [ {Returns a diff CHANGE command that can be DOne to apply it.} series [series!] origin [pair!] diff [pair!] ][ compose/deep [ change/part at result (origin/1 + 1) [(copy/part at series (origin/1 + 1) (diff/2 - 1))] (diff/2 - 1) ] ] build-insert-command: func [ {Returns a diff INSERT command that can be DOne to apply it.} series [series!] origin [pair!] diff [pair!] ][ compose/deep [ insert at result (origin/1 + 1) [(copy/part at series (origin/1 + 1) (diff/1 - 1))] ] ] build-remove-command: func [ {Returns a diff REMOVE command that can be DOne to apply it.} origin [pair!] diff [pair!] ][ compose/deep [ remove/part at result ; OK, now this guy is kind of a special case. We check to see ; if (diff-from-last/1 > 1) which would mean that we just did ; an insert. If so, add the length of that insert to the offset ; where we want to do our remove, otherwise, add 0. (origin/1 + 1 + either diff/1 > 1 [diff/1 - 1] [0]) (diff/2 - 1) ] ] create: func [ {The diff we create can be used to create 'a from 'b just by DOing it.} a [series!] b [series!] /local xy last-pair diff-from-last result ][ if strict-equal? a b [return copy [result]] xy: lcs/matrix-as-pairs a b lcs/build-matrix a b result: make block! length? xy if empty? xy [return result] last-pair: 0x0 foreach pair xy [ diff-from-last: pair - last-pair ;print ["pair" pair "last" last-pair "diff" diff-from-last] either diff-from-last/1 = diff-from-last/2 [ if diff-from-last <> 1x1 [ ;print [tab "diff/1 = diff/2" "CHANGE" "diff=" diff-from-last "last=" last-pair] append result build-change-command a last-pair diff-from-last ] ][ if diff-from-last/1 > 1 [ ;print [tab "diff/1 > diff/2" "INSERT" "diff=" diff-from-last "last=" last-pair] append result build-insert-command a last-pair diff-from-last ] if diff-from-last/2 > 1 [ ;print [tab "diff/1 < diff/2" "REMOVE" "diff=" diff-from-last "last=" last-pair] append result build-remove-command last-pair diff-from-last ] ] last-pair: pair first-pair: false ] ; Now we've looped through all the internal items but we still ; have to add and remove trailing items from the last pair out. last-pair: last xy append result compose/deep [ append clear at result (last-pair/1 + 1) [(copy at a (last-pair/1 + 1))] ] clear xy result ] apply: func [ {Applies a set of commands, built by CREATE, to the data series and returns the result.} data [series!] {The data to which you want to apply the diff patch.} diff-blk [any-block!] {The block of change commands you want applied to the data. Generally this is the result of a diff/create call.} /local result ][ ; If we can make our copy of the data a list!, the insert and remove ; ops should be much more efficient. Gotta figure out a general ; solution though. ; make list! hoses us up here. result: copy data ;either any-block? data [make list! data][copy data] do bind/copy diff-blk 'result ;head result result ] ]

 [7/9] from: SunandaDH:aol at: 22-Aug-2003 7:36


Thanks Gregg:
> Not polished, lots of "idea" stuff in there > cluttering things up, but maybe something you can use.
Very clever code, but a couple of problems: First, it gets confused by completely different strings, or case differences:
>> print mold diff/create "aaa" "bbb"
[]
>> print mold diff/create "aaa" "bbbAAA"
[]
>> print mold diff/create "AAA" "bbbAAA"
[ remove/part at result 1 3 append clear at result 4 [""]] Second, more important for my purposes, the output isn't very human readable. Take my example: version-1: { rebol [ name: "fred" title: "test script" ] print "test" haltt} version-2: { REBOL [ name: "fred" title: "test script" history: [21-aug-2003 "Fixed exit problem"] halt} print mold diff/create version-2 version-1 [ change/part at result 2 ["REBOL"] 5 insert at result 56 ["h"] remove/part at result 57 5 change/part at result 58 ["s"] 1 insert at result 60 ["ory:"] insert at result 65 ["[21-aug-2003 "] insert at result 79 ["Fixed exi"] insert at result 89 [" probl"] insert at result 96 ["m"] remove/part at result 97 2 insert at result 98 ["]"] insert at result 100 ["^/"] append clear at result 105 [""]]
>>
It's not at all obvious what the edits are that a *human* did to go from one version to the next. I've taken a look at your code, but I can't see any obvious way to morph it into something that reports differences by lines. Sunanda.

 [8/9] from: greggirwin:mindspring at: 22-Aug-2003 8:35


Hi Sunanda, Sac> Very clever code, but a couple of problems: Sac> First, it gets confused by completely different strings, Guess I need to revisit that. :) Thanks. Sac> ...or case differences: If you compare strings, by byte, case will matter because each item in the series is char that it's comparing. I figured mine wouldn't work out of the box for you, since there's no UI and the output format was designed so you could just DO it to apply diff patches. You should be able to build a front end for it though. Check out Volker's stuff (if he's published it); that may work for you. Sac> I've taken a look at your code, but I can't see any obvious way Sac> to morph it into something that reports differences by lines. All you have to do is give it blocks of lines to compare. It compares two series element by element, it doesn't parse strings into lines for you automatically. version-1: parse/all { rebol [ name: "fred" title: "test script" ] print "test" haltt} "^/" version-2: parse/all { REBOL [ name: "fred" title: "test script" history: [21-aug-2003 "Fixed exit problem"] halt} "^/"
>> diff/create version-2 version-1
== [ change/part at result 5 [{ history: [21-aug-2003 "Fixed exit problem"]}] 1 append clear at result 7 ["halt"]] So, it's still not human-friendly output or, even better, a side-by-side display, but not entirely useless in some cases. -- Gregg

 [9/9] from: nitsch-lists:netcologne at: 23-Aug-2003 6:05


Gregg Irwin wrote:
>Hi Sunanda, >Sac> I'd like to be able to compare two scripts (or any pair of newline-separated
<<quoted lines omitted: 4>>
>collected dust since then. Not polished, lots of "idea" stuff in there >cluttering things up, but maybe something you can use.
Now on /developer in my folder. failed to understand it again :) Basically it works like this: while[lines left][ take the first line in the left text. find it in the right text. either found [ collect it and all following same lines. save it somewhere and remove both blocks. ][ collect lines until you find one in the right text save them somewhere ] ] do something usefull with the collected blocks.
>-- Gregg >
-Volker

Notes
  • Quoted lines have been omitted from some messages.
    View the message alone to see the lines that have been omitted