[REBOL] Re: Large file compare
From: SunandaDH:aol at: 8-Jun-2005 11:18
Thorsten:
> As the file content is very trivial like "2348246864;PCINIT2" and can be
> sorted, i tink of something like stepping through the files line by line
> and compare the line content
REBOL.org uses a compare utility to let script owners see changes in
different versions of a script they've contributed to the Library. It is not currently
publicly available.
I wrote it. And I was concerned not to use recursion as REBOL has a small
stack for that. That limited the possible approaches.
As scripts are line-oriented things, I used the approach you suggest: sort
both files, use a merge/compare to remove matching lines. On the rest, apply
some heuristics to distinguish inserts and deletions from block moves.
It works best where there are not a large number of identical lines. Which is
why, by default, it ignores blank lines when comparing ..... which, as a
happy side-effect, is usually what you'd want when comparing source code versions.
It wouldn't work on the size files you have targeted as it acts on blocks in
memory. But as most of its processes involve running up and down blocks
flagging things there is no in principle reason why the same logic couldn't work on
files -- given that read/direct works.
Is that enough hints for now?
Sunanda.