Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: Large file compare

From: tom:conlin:g:mail at: 8-Jun-2005 13:17

I dont have time to try/test any of this so some of the logic may be reversed, but might a single pass approach help ;;; a b sorted blocks while[all[not tail? a not tail b]][ either equal? first a first b [ insert/only tail in-both first a a: next a b: next b ] [either greater? first a first b [insert/only tail only_a first a a: next a] [insert/only tail only_b first b b: next b] ] ] ;;; incase one finishes before the other while[not tail? a][ insert/only tail only_a first a a: next a ] while[not tail? b][ insert/only tail only_b first b b: next b ] On 6/8/05, Thorsten Moeller <[valleyroad--gmx--de]> wrote:
> Hi Gabriele, > > good hints. So, i now use read/line instead of read and write out the > result from the difference operation immediatly and remove it from > memory. This drops the actual memory consumption to 140 MB during > intersect and 120 MB during difference. > > But i still think it will become too big when operating on the whole > set. > > As the file content is very trivial like "2348246864;PCINIT2" and can be > sorted, i tink of something like stepping through the files line by line > and compare the line content. The file which have the lead in the > comparison will be alternating, depending, if there is a difference in > the first or second column. This only works, when both files are sorted > identically. > > There will be a minimum memory consumption. But, what i don't know is, > what commands to use as they must remember the positions in the files. > > I will think on this further. Perhaps you have good idea how this could > be implemented. What i don't know know is, if this will be fast enough. > > Thanks > > Thorsten > > On Wed, 8 Jun 2005 12:31:56 +0200, "Gabriele Santilli" > <[gabriele--colellachiara--com]> said: > > > > Hi Thorsten, > > > > On Wednesday, June 8, 2005, 11:53:08 AM, you wrote: > > > > TM> a: read %testfile1.txt > > TM> b: read %testfile2.txt > > > > Did you mean READ/LINES? > > > > TM> inboth: intersect a b > > TM> only_a: difference inboth a > > TM> only_b: difference inboth b > > > > TM> My question is, if there are better ways in rebol to achive the same > > with > > TM> lesser memory consumption?? > > > > Yes - don't load the whole files in memory. :-) > > > > Is the difference going to be big too? If so you may want to avoid > > keeping it in memory too. > > > > OTOH, if you have enough memory for the operation, doing it all in > > memory is going to be much faster. > > > > Regards, > > Gabriele. > > -- > > Gabriele Santilli <[g--santilli--tiscalinet--it]> -- REBOL Programmer > > Amiga Group Italia sez. L'Aquila --- SOON: http://www.rebol.it/ > > > > -- > > To unsubscribe from the list, just send an email to > > lists at rebol.com with unsubscribe as the subject. > > > > > -- > Melian Solutions > Thorsten Moeller > > Mail: [tmoeller--fastmail--fm] > > -- > http://www.fastmail.fm - One of many happy users: > http://www.fastmail.fm/docs/quotes.html > > -- > Geschenkt: 3 Monate GMX ProMail gratis + 3 Ausgaben stern gratis > ++ Jetzt anmelden & testen ++ http://www.gmx.net/de/go/promail ++ > -- > To unsubscribe from the list, just send an email to > lists at rebol.com with unsubscribe as the subject. >
-- ... nice weather eh