Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search

[REBOL] Re: Working with large files

From: brock:kalef:innovapost at: 12-Aug-2008 13:51

Thanks to everyone for their feedback/suggestions. I seem to have a solution that will back track to the starting point of any non-complete record. This should work on any data that is newline terminated and you can set the amount of data to grab in each call to grab a new batch of data using the 'size word. The number represents the number of bytes to copy from the data file. rebol[] port: open/seek %"Sample data/simplified.log" size: 130 cnt: 1 while [not tail? port] [ data: copy/part port size working-data: copy data either (last working-data) = #"^/" [ use-last-record?: true start-at: (index? data) + :size ][ use-last-record?: false either not error? try [start-at: (index? find/reverse tail data "^/")][ ; new starting point of next read since block didn't end in full record; start-at: (index? find/reverse tail data "^/") ][ start-at: (index? data) + :size ] ] working-data: parse/all working-data "^/" record-cnt: length? working-data print ["Record Count " :cnt ": " record-cnt] print ["First Record:^/" first working-data] print ["Use last record?: " use-last-record?] print ["Last Record:^/" last working-data newline newline] port: skip port (size + start-at - size) cnt: cnt + 1 ] close port halt If anyone wants to try this for themselves here's a sample data file that can be cut and then saved to disk and then change the file path in the script above. I used this data to be able to quickly identify what record you are in. If you save the file, make sure there is an emply line at the end of the data file. 1 record1 record1recordonerecord1 end 2 recordtwo record2 record 2 record 2 end 3 rec3 recordthree record3 record 3 record3 end 4 record 4 record4 recordfour record14 end 5 recordfive record5 record 5 record 5 end 6 rec6 recordsix record6 record 6 record6 end 7 record 7 record7 recordseven record7 end 8 record 8 record8 recordeight record8 end 9 recordnine record9 record 9 record 9 end 10 rec10 recordten record10 record 10 record10 end 11 record 11 record11 recordeleven record11 end 12 recordtwelve record12 record 12 record 12 end 13 rec13 recordthirteen record13 record 13 record13 end 14 record 14 record14 recordfourteen record14 end 15 recordfifteen record15 record 15 record 15 end 16 rec16 recordsixteen record16 record 16 record16 end I just finished running the above script on a 900+ MB file and it processed through to the end no problem. Brock