Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: Line reduction

From: joel:neely:fedex at: 6-Jul-2001 13:17

Hi, Aaron, Having had time to sleep on it (and clear my head of some other distractions)... [aroberts--swri--edu] wrote:
> I have a very large dataset comprised of numbers. The values > come in a set of three, one set to a line. The ordering of > the sets is arbitrary. I need a way of reducing the data down, > so I can get a 'snap shot' of the full data set... >
That being the case, another solution (that requires less processing internally) would be: source-file: to-file ask "Source file name: " output-file: to-file ask "Output file name: " sample-pct: 0.01 * min 100 max 0 to-decimal ask "% of data to sample: " line-count: length? all-the-data: read/lines to-file source-file write/lines to-file output-file at all-the-data to-integer line-count - (line-count - 1 * sample-pct) + 0.5 Expressing the sampling rate as the %-age of the data you want to keep seems to be fairly user-friendly, and lets you get the exact level you want without multiple passes (e.g. 25% instead of half of half). The value of SAMPLE-PCT is limited to the range 0.0 through 100.0 to protect against bogus entries, keying errors, etc. The input file is read into a block of lines, whose length is the line count. The last expression simply calcuates where the *last* SAMPLE-PCT of the lines are found, and write from there to the end of the block. Therefore, no copying or removing is required. This version still reads the entire file into memory as the way to find out the line count. If all of your lines were close enough to the same length, you could modify the arithmetic to start with the size of the input file, calculate the percentage of that total size, then read and write only that much (ignoring the partial line that might appear at the end). -jn- ___________________________________________________________________ The purpose of computing is insight, not numbers! - R. W. Hamming joel'dot'neely'at'fedex'dot'com