Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: Large file compare

From: SunandaDH:aol at: 8-Jun-2005 11:18

Thorsten:
> As the file content is very trivial like "2348246864;PCINIT2" and can be > sorted, i tink of something like stepping through the files line by line > and compare the line content
REBOL.org uses a compare utility to let script owners see changes in different versions of a script they've contributed to the Library. It is not currently publicly available. I wrote it. And I was concerned not to use recursion as REBOL has a small stack for that. That limited the possible approaches. As scripts are line-oriented things, I used the approach you suggest: sort both files, use a merge/compare to remove matching lines. On the rest, apply some heuristics to distinguish inserts and deletions from block moves. It works best where there are not a large number of identical lines. Which is why, by default, it ignores blank lines when comparing ..... which, as a happy side-effect, is usually what you'd want when comparing source code versions. It wouldn't work on the size files you have targeted as it acts on blocks in memory. But as most of its processes involve running up and down blocks flagging things there is no in principle reason why the same logic couldn't work on files -- given that read/direct works. Is that enough hints for now? Sunanda.