Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: Sort by first part of line

From: carl:cybercraft at: 6-Sep-2002 20:27

On 06-Sep-02, [SunandaDH--aol--com] wrote:
> Louis: >> Your sort worked perfectly. Thanks also for the explanation. >> You might be interested to know that >> on my 450 Pentium 2 running w2k total time to sort >> 29,688 lines using your code was: >> >> 3:50:10 > <snip> >> One question I'll ask now however: What exactly does hash do, and >> could hash be used to speed up sort? > I'm glad to hear it worked, even if it took three hours. > In my experience, you can speed things up with more memory.
Or, perhaps, by using less? I've come late into this thread, but I take it Louis has large text files which he wants to sort by the first three characters in the line, which are of the form "001", "002", "001" etc? Why not, instead of loading the whole files in, just build an index to the lines in the file and sort that? For instance... write %test.txt {001 a b c 002 cc dd ee ff 001 g h i 003 jj kk ll mm 002 n m o p} This then allows us to treat the file on disk as a series...
>> file: open/lines %test.txt >> print file/1
001 a b c
>> print file/3
001 g h i
>> print file/4
003 jj kk ll mm
>> close file
So, to create an unsorted index based on the first three letters of each line... file-index: [] file: open/lines %test.txt forall file [ append file-index copy/part file/1 3 append file-index index? file ] close file After running the above, file-index contains...
>> file-index
== ["001" 1 "002" 2 "001" 3 "003" 4 "002" 5] That we can now sort...
>> sort/skip file-index 2
== ["001" 1 "001" 3 "002" 2 "002" 5 "003" 4] and use to print out the file with its lines sorted... file: open/lines %test.txt foreach [code line] file-index [ print file/:line ] close file Which, when run, returns... 001 a b c 001 g h i 002 cc dd ee ff 002 n m o p 003 jj kk ll mm how long REBOL would take to sort 29,688 of those pairs of values I don't know, (maybe the strings would be better converted to integers before sorting?), but it should cut down a lot on the memory use. This all assumes I've understood the problem right, of course. (: -- Carl Read