Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: Multi-process

From: joel:neely:fedex at: 2-Nov-2002 13:44

Hi, Louis, Thanks for the thought-provoking question! Louis A. Turk wrote:
> > > In a certain script I must manually insert data then save > > > the file. The file is huge and takes over two minutes to > > > save, while I just sit there looking at the screen...
...
> Here is what is happening: >
...
> foreach line lines [ > write/append %i-8.txt join line newline > ]
...
> >How large is the file? > > 8 MB >
Let me focus on just the part that takes a block (of strings, I'll assume) and writes them into a disk file. I kludged up a little test, which began by creating a block of strings parameterized on the value of N as follows ... stuff: make block! n repeat i n [ insert tail stuff rejoin [ "abcdefghijklmnopqrstuvwxyz" "-" i "-" "ABCDEFGHIJKLMNOPQRSTUVWXYZ" ] ] ... then ensuring that my output files didn't exist ... if found? info? wa: %wa.data [delete wa] if found? info? wl: %wl.data [delete wl] if found? info? wj: %wj.data [delete wj] if found? info? wk: %wk.data [delete wk] ... then tried some different approaches to writing the block's content to a file (a different file for each approach). The first approach resembles your post AFAICT by appending each item in the block to the end of a file ... foreach line stuff [ write/append wa join line newline ] ... the next one explicitly assembles a big string containing all block items (with JOINed NEWLINEs) and then writes that string as a single action ... jj: copy "" foreach line stuff [ insert tail jj join line newline ] write wj jj ... the next one shows the Power of Thinking Ahead, by figuring out how long the result string will be and then preallocating that much space before inserting the lines (and newlines) into the string ... kl: 0 foreach line stuff [kl: kl + 1 + length? line] kk: make string! kl foreach line stuff [ insert insert tail kk line newline ] write wk kk ... the last one lets REBOL do all of the dirty work ... write/lines wl stuff The exact timings are *very* dependent on your system (CPU speed, memory, disk I/O throughput) and the value of N , but for the sake of reference my dog-slow benchmark box (200 MHz Pentium, with 112 Mb RAM and w95) There are a few things we know, but their combined effect may still come as a surprise (it did for me! ;-) * Disk I/O is slow compared to computation. * Opening and closing files is very slow compared to computation. * JOIN is expensive, as it must allocate a new STRING! for each evaluation. * Preallocating series space can significantly improve speed. A little testing shows the following: * WRITE/APPEND was slowest (barely) for N=50000, requiring about 280 times as long as WRITE/LINES does. * WRITE/APPEND was slow, but apparently linear, so that its time increases proportionally to N. * FOREACH ... [INSERT TAIL ... JOIN ...] WRITE ... was next slowest for N=50000, requiring about 255 times as long as WRITE/LINES ... * ... but it's *not* linear! Without preallocation, its time appeared to grow quadratically with N, so that a little beyond N=50000, this becomes the slowest approach. * The Thinking Ahead approach would seem to have a speed handicap because of the double pass across the block, but ... Surprise! It needed only about 11 times as long as WRITE/APPEND did. The pass to calculate total length didn't require memory management, and the big buffer only got allocated once. After that, the second loop was INSERTing into space guaranteed to be available, and so could move more quickly. (Did you notice that it did a double-INSERT instead of INSERT...JOIN? That change alone cut the run time roughly in half.) Morals for me are: * Do as much as possible in memory to avoid I/O. * Preallocate series space whenever possible, even if it takes a little work to figure out how much to allocate! Thanks again for stimulating some pondering! -jn- -- ; Joel Neely joeldotneelyatfedexdotcom REBOL [] do [ do func [s] [ foreach [a b] s [prin b] ] sort/skip do function [s] [t] [ t: "" foreach [a b] s [repend t [b a]] t ] { | e s m!zauafBpcvekexEohthjJakwLrngohOqrlryRnsctdtiub} 2 ]