Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: Sort by first part of line

From: joel:neely:fedex at: 7-Sep-2002 22:06

Hi, Romano, et alia, Romano Paolo Tenca wrote:
> > 1. Parse is slow -- take it out of the critical path (as Gregg > > and Romano do) or eliminate it (as Scott and Joel do). > > 2. It's not the number of parse's (Greg has one per line; Romano > > has one for the whole dataset) so much as the volume of data to > > be parsed > > I do not agree. The greatest problems are in memory allocation. >
...
> I think parse is one of the more fast constructs in Rebol. >
Certainly memory management (especially when constructing blocks in interpreted expressions as I did) becomes a very significant component of the total run time as the size/number of blocks continues to increase. As for the rest, I must disagree with *both* you and Sunanda. (Some trick, eh? ;-) I don't attach meaning to absolute phrases such as "PARSE is slow" or "PARSE is fast", but instead think in terms such as "PARSE is slower/faster than _____ for doing ____". For example, if I replace the phrase to-integer copy/part next item 3 in my original grouping algorithm with to-integer first parse item none two things happen: 1) The issues of leading whitespace and exact number of digits in the numeric-looking prefix now go away. 2) The processing slows down by about 20% (takes about 20% longer to run). So, *in*this*case*, PARSE is slower than explicit copying for doing the job of pulling out a fixed-length/position prefix. Generality almost always has a price, and performance optimization almost always increases brittleness. I'd prefer to rephrase Sunanda's points as: 1) Move as much processing as possible out of inner loops. 2) Process as little data as possible to get the work done. regardless of how the processing is invoked. In the second point, doing a whitespace-driven parse of the entire line (or, worse, of the entire file) is overkill, since we don't need the rest of the line parsed/broken down for the task as specified. Reminding myself to take a clue from the first point, we can use the implicit looping of PARSE, and the fact that the prefixes are only three digits long to come up with something like this: 8<-------- t0: now/time/precise buffer: copy/deep array/initial 999 [[]] digit: charset "0123456789" parse/all read f [ some [ some " " copy nr some digit copy data [thru "^/" | to end] ( append pick buffer to-integer nr data ) ] ] t1: now/time/precise print to-decimal t1 - t0 8<-------- which, by my tests, actually runs a bit (~7%) faster than my first version (which extended the buffer within the inner loop whenever needed). Feeding that same idea back into the first version produces something like this: 8<-------- t0: now/time/precise buffer: copy/deep array/initial 999 [[]] foreach item read/lines f [ nr: to-integer copy/part next item 3 append buffer/:nr item ] t1: now/time/precise print to-decimal t1 - t0 8<-------- which runs about 9% faster than the original. -jn- -- ; Joel Neely joeldotneelyatfedexdotcom REBOL [] do [ do func [s] [ foreach [a b] s [prin b] ] sort/skip do function [s] [t] [ t: "" foreach [a b] s [repend t [b a]] t ] { | e s m!zauafBpcvekexEohthjJakwLrngohOqrlryRnsctdtiub} 2 ]