Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Working with large files

From: brock::kalef::innovapost::com at: 11-Aug-2008 15:11

I'm looking to read 800+ MB web log files and process the log prior to running through an analysis tool. I'm running into "Out of Memory" errors and the odd Rebol Crash in attempting to do this. I started out simply reading the data directly into a word and looping through the data. This worked great for the sample data set of 45 MB. this then failed on a 430+ MB file. i.e.. data: read/lines %file-name.log I then changed the direct read to use a port i.e.. data-port: open/lines %file-name.log. This worked for the 430+ MB file but then I started getting the errors again for the 800+ MB files. It's now obvious that I will need to read in portions of the file at a time. However, I am unsure how to do this while also ensuring I get all the data. As you can see from my earlier example code, I'm interested in reading a line at a time for simplicity in processing the records as they are not fixed width (vary in length). My fear is that I will not be able to properly handle the records that are truncated due to the size of the data block I retrieve from the file. Or atleast not be able to do this easily. Are there any suggestions? My guess is that I will need to; - pull in a fixed length block of data - read to the data until I reach the first occurrence of a newline - track the index of the location of the newline - continue reading the data until I reach the end of the data-block - once reaching the end of the data retrieved, calculate where the last record process ended - read the next data block from that point - continue until reaching the end of file Any other suggestions? Regards, Brock Kalef