Mailing List Archive: Re: Multi-process

[REBOL] Re: Multi-process

From: joel:neely:fedex at: 2-Nov-2002 13:44


Hi, Louis,

Thanks for the thought-provoking question!

Louis A. Turk
 wrote:
> > > In a certain script I must manually insert data then save
> > > the file.  The file is huge and takes over two minutes to
> > > save, while I just sit there looking at the screen...
...
> Here is what is happening:
>
...

> foreach line lines [
>          write/append %i-8.txt join line newline
> ]
...
> >How large is the file?
>
> 8 MB
>

Let me focus on just the part that takes a block (of strings,
I'll assume) and writes them into a disk file.  I kludged up
a little test, which began by creating a block of strings
parameterized on the value of N as follows ...

    stuff: make block! n
    repeat i n [
        insert tail stuff rejoin [
            "abcdefghijklmnopqrstuvwxyz"
            "-"
            i
            "-"
            "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
        ]
    ]

... then ensuring that my output files didn't exist ...

    if found? info? wa: %wa.data [delete wa]
    if found? info? wl: %wl.data [delete wl]
    if found? info? wj: %wj.data [delete wj]
    if found? info? wk: %wk.data [delete wk]

... then tried some different approaches to writing the block's
content to a file (a different file for each approach).  The
first approach resembles your post AFAICT by appending each item
in the block to the end of a file ...

    foreach line stuff [
        write/append wa join line newline
    ]

... the next one explicitly assembles a big string containing
all block items (with JOINed NEWLINEs) and then writes that
string as a single action ...

    jj: copy ""
    foreach line stuff [
        insert tail jj join line newline
    ]
    write wj jj

... the next one shows the Power of Thinking Ahead, by figuring
out how long the result string will be and then preallocating
that much space before inserting the lines (and newlines) into
the string ...

    kl: 0
    foreach line stuff [kl: kl + 1 + length? line]
    kk: make string! kl
    foreach line stuff [
        insert insert tail kk line newline
    ]
    write wk kk

... the last one lets REBOL do all of the dirty work ...

    write/lines wl stuff

The exact timings are *very* dependent on your system (CPU speed,
memory, disk I/O throughput) and the value of N , but for the
sake of reference my dog-slow benchmark box (200 MHz Pentium, with
112 Mb RAM and w95)

There are a few things we know, but their combined effect may
still come as a surprise (it did for me! ;-)

*  Disk I/O is slow compared to computation.
*  Opening and closing files is very slow compared to computation.
*  JOIN is expensive, as it must allocate a new STRING! for each
   evaluation.
*  Preallocating series space can significantly improve speed.

A little testing shows the following:

*  WRITE/APPEND was slowest (barely) for N=50000, requiring about
   280 times as long as WRITE/LINES does.

*  WRITE/APPEND was slow, but apparently linear, so that its time
   increases proportionally to N.

*  FOREACH ... [INSERT TAIL ... JOIN ...] WRITE ... was next slowest
   for N=50000, requiring about 255 times as long as WRITE/LINES ...

*  ... but it's *not* linear!  Without preallocation, its time
   appeared to grow quadratically with N, so that a little beyond
   N=50000, this becomes the slowest approach.

*  The Thinking Ahead approach would seem to have a speed handicap
   because of the double pass across the block, but ... Surprise!
   It needed only about 11 times as long as WRITE/APPEND did.  The
   pass to calculate total length didn't require memory management,
   and the big buffer only got allocated once.  After that, the
   second loop was INSERTing into space guaranteed to be available,
   and so could move more quickly.  (Did you notice that it did a
   double-INSERT instead of INSERT...JOIN?  That change alone cut
   the run time roughly in half.)

Morals for me are:

*  Do as much as possible in memory to avoid I/O.

*  Preallocate series space whenever possible, even if it takes
   a little work to figure out how much to allocate!

Thanks again for stimulating some pondering!

-jn-

--
; Joel Neely                             joeldotneelyatfedexdotcom
REBOL [] do [ do func [s] [ foreach [a b] s [prin b] ] sort/skip
do function [s] [t] [ t: "" foreach [a b] s [repend t [b a]] t ] {
| e s m!zauafBpcvekexEohthjJakwLrngohOqrlryRnsctdtiub} 2 ]