[REBOL] Re: Sort by first part of line
From: joel:neely:fedex at: 6-Sep-2002 18:29
Hi, Louis, and everybody,
I've been covered up, so I may have missed something, but here's my
suggestion for a different approach (forgive me if somebody has
mentioned this already or if I've misunderstood the problem!)
Louis A. Turk
wrote:
> Anyway, here is the data to be sorted if perchance it might be
> useful for timing purposes (691KB):
>
> http://www.pusatberita.com/test.txt
>
After glancing at your data, I decided to try a version that just
eliminates the sorting entirely. Let me restate the problem in
that fashion. We have:
- a file of text lines, each of which contains:
- a single space
- a three digit number (with zero padding on the left)
- a single space
- some words
and we desire
- the lines rearranged by the leading number, but otherwise in
the same order as the original data.
Since there are relatively few three-digit integers, I thought I'd
try setting up a collection of blocks, so that block 1 will hold
all lines beginning with 001, block 2 will hold all lines beginning
with 002, etc. Here's the code, including timers...
8<--------
REBOL []
buffer: []
t0: now/time/precise
foreach item read/lines %pusatberita.text [
nr: to-integer copy/part next item 3
while [
nr > length? buffer
][
insert/only tail buffer copy []
]
append buffer/:nr item
]
t1: now/time/precise
print to-decimal t1 - t0
8<--------
When I run the above on my desk at work, using a downloaded copy
of your data, it takes about one second. To iterate through the
data lines in the desired order, we can do something like
foreach group buffer [
foreach line group [
print line ;; or whatever you wish to do with it
]
]
Someone who's already been benchmarking might try comparing this
with what's already been proposed, using the data from your web
site.
-jn-