[REBOL] Re: Sort by first part of line
From: carl:cybercraft at: 6-Sep-2002 20:27
On 06-Sep-02, [SunandaDH--aol--com] wrote:
> Louis:
>> Your sort worked perfectly. Thanks also for the explanation.
>> You might be interested to know that
>> on my 450 Pentium 2 running w2k total time to sort
>> 29,688 lines using your code was:
>>
>> 3:50:10
> <snip>
>> One question I'll ask now however: What exactly does hash do, and
>> could hash be used to speed up sort?
> I'm glad to hear it worked, even if it took three hours.
> In my experience, you can speed things up with more memory.
Or, perhaps, by using less?
I've come late into this thread, but I take it Louis has large text
files which he wants to sort by the first three characters in the
line, which are of the form "001", "002", "001" etc?
Why not, instead of loading the whole files in, just build an index to
the lines in the file and sort that? For instance...
write %test.txt {001 a b c
002 cc dd ee ff
001 g h i
003 jj kk ll mm
002 n m o p}
This then allows us to treat the file on disk as a series...
>> file: open/lines %test.txt
>> print file/1
001 a b c
>> print file/3
001 g h i
>> print file/4
003 jj kk ll mm
>> close file
So, to create an unsorted index based on the first three letters of
each line...
file-index: []
file: open/lines %test.txt
forall file [
append file-index copy/part file/1 3
append file-index index? file
]
close file
After running the above, file-index contains...
>> file-index
== ["001" 1 "002" 2 "001" 3 "003" 4 "002" 5]
That we can now sort...
>> sort/skip file-index 2
== ["001" 1 "001" 3 "002" 2 "002" 5 "003" 4]
and use to print out the file with its lines sorted...
file: open/lines %test.txt
foreach [code line] file-index [
print file/:line
]
close file
Which, when run, returns...
001 a b c
001 g h i
002 cc dd ee ff
002 n m o p
003 jj kk ll mm
how long REBOL would take to sort 29,688 of those pairs of values I
don't know, (maybe the strings would be better converted to integers
before sorting?), but it should cut down a lot on the memory use.
This all assumes I've understood the problem right, of course. (:
--
Carl Read