Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: Sort by first part of line

From: gscottjones:mchsi at: 6-Sep-2002 10:22

Hi, everyone, My two cents worth: I recreated a similar data set as Louis (but mine being total nonsence, of course :-) with the following algorithm: loop 29688 [ write/append/lines %//windows/desktop/data1.txt rejoin [ " " random/only {1234567890} random/only {1234567890} random/only {1234567890} " " random {eawk acef 32233} ] ] It's a quick hack, so stop laughing! A sample of the data looks like: ... 977 3 2e33aewacf2k 638 e ak33we2c3a 2f 355 ak cefa233we3 2 216 e ec3w3aa2f2k3 422 3fc2eeaa33w k2 462 wa3 ae3e2fk32c 238 wea 3ack2 23f3e 616 2aa3 f3k3w2e ec 658 w 32 ckfa33a2ee 537 3e 3fw2e acak23 592 33 c3ef22we aka 858 e3w3 cea2k2fa 3 453 efca kw223ae33 ... and the resulting file size was 637kb. As a rough comparison, to see if I was on track, I adapted Sunanda's code (assuming I got the final version correctly): ;;;;;;;;;;;;;;;;;;;;;;;; t: now/time sort/compare sorted-data: copy data func [a b /local a-key b-key ] [ a-key: join second parse/all a " " ["-" index? find data a] b-key: join second parse/all b " " ["-" index? find data b] ;print [a-key " " b-key] return a-key < b-key ] print now/time - t ;;;;;;;;;;;;;;;;;;;;;;;; I didn't do dedicated run (I needed to do a few other things), but I interrupted the run after 4 hours on my 500mhz Celeron 128MB Win98. Next, I drew a few ideas from contributions, but I was determined to let in run in memory. Here was the resulting code: ;;;;;;;;;;;;;;;;;;;;;;;; rebol [] full-time: now/time data: read/lines %//windows/desktop/data1.txt data-blk: copy [] foreach datum data [ append data-blk second parse/all datum " " append data-blk datum ] sort-time: now/time sort/skip data-blk 2 print now/time - sort-time foreach [key value] data-blk [ write/lines/append %//windows/desktop/sorted-data.txt value ] print now/time - full-time print "done" ask "" ;;;;;;;;;;;;;;;;;;;;;;;; The idea was to do a parse through the code only once, and construct a new key and value pairing data block. Then I did a a simple sort/skip by 2. Time for the sort (alone) was 1 to 2 seconds on my machine, and total runtime about 33 seconds. Sample of the resulting data: ... 000 ae332ae ckw 3f2 000 ea a32c323w fek 000 3ac33 wee2 2afk 000 3 22kee33w acaf 000 2a2e3cak f e33w 000 ak 3e2fe3wa2c3 000 ae2 2efw3ca3 3k 000 ee2ca3w 3ka3f2 000 2 afewe333cak2 000 ce2ew3a3a3 k2f 000 kcaa3f23 w e32e 000 3e2kcwf33 2eaa 000 e3e 2waf cak323 000 332e ak2eca3fw 000 3fek2wa3c a23 e 000 32ck e ea23faw3 000 akf33 eace3w2 2 000 332ew2 ka3eca f ... I am assuming that this is considered a stable sort based on the first alphanumeric "word" only. Comments welcome. Respectfully, --Scott Jones