Mailing List Archive: Re: Large file compare

[REBOL] Re: Large file compare

From: valleyroad::gmx::de at: 8-Jun-2005 14:09


Hi Gabriele,

good hints. So, i now use read/line instead of read and write out the
result from the difference operation immediatly and remove it from
memory. This drops the actual memory consumption to 140 MB during
intersect and 120 MB during difference.

But i still think it will become too big when operating on the whole
set.

As the file content is very trivial like "2348246864;PCINIT2" and can be
sorted, i tink of something like stepping through the files line by line
and compare the line content. The file which have the lead in the
comparison will be alternating, depending, if there is a difference in
the first or second column. This only works, when both files are sorted
identically.

There will be a minimum memory consumption. But, what i don't know is,
what commands to use as they must remember the positions in the files.

I will think on this further. Perhaps you have good idea how this could
be implemented. What i don't know know is, if this will be fast enough.

Thanks

Thorsten

On Wed, 8 Jun 2005 12:31:56 +0200, "Gabriele Santilli"
<[gabriele--colellachiara--com]> said:
> Hi Thorsten,
>
> On Wednesday, June 8, 2005, 11:53:08 AM, you wrote:
>
> TM> a: read %testfile1.txt
> TM> b: read %testfile2.txt
>
> Did you mean READ/LINES?
>
> TM> inboth: intersect a b
> TM> only_a: difference inboth a
> TM> only_b: difference inboth b
>
> TM> My question is, if there are better ways in rebol to achive the same
> with
> TM> lesser memory consumption??
>
> Yes - don't load the whole files in memory. :-)
>
> Is the difference going to be big too? If so you may want to avoid
> keeping it in memory too.
>
> OTOH, if you have enough memory for the operation, doing it all in
> memory is going to be much faster.
>
> Regards,
>    Gabriele.
> --
> Gabriele Santilli <[g--santilli--tiscalinet--it]>  --  REBOL Programmer
> Amiga Group Italia sez. L'Aquila  ---   SOON: http://www.rebol.it/
>
> --
> To unsubscribe from the list, just send an email to
> lists at rebol.com with unsubscribe as the subject.
>
--
  Melian Solutions
  Thorsten Moeller

  Mail: [tmoeller--fastmail--fm]

--
http://www.fastmail.fm - One of many happy users:
  http://www.fastmail.fm/docs/quotes.html