Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Read/lines on a string!

From: edoconnor:g:mail at: 20-Sep-2007 11:57

Anyone know if there's a way to perform a read/lines on a string? Yesterday I needed to process a 100 Kb CSV file and perform some actions based on the data. In a simple world, I would be able to read in the files like this: data: read/lines %mycsv.csv and then iterate over the data with a foreach or forall. Unfortunately, some of the data fields in the CSV are empty, and some contain delimiters as well as carriage-returns (all of which are valid in a CSV). In this case using 'read/lines leads to a mess, because the data looks like this:
>> data: read %mycsv.csv
== {"line1 item1","line1 item2","line 1 item3" line2, alleged item1 ,"","line2 item3" } This does not lend itself to a quick parse rule, so I punt and decide to read the string and perform some data scrubbing:
>> data: read %mycsv.csv
...
>> replace/all data {,"",} {,"None",} >> replace/all data {""} {'}
This cleans up the quotes delimiters. To clean up carriage returns, I concatenate any lines which do not end in a double-quote, swapping the ^/ for a pipe within the field.
>> parse/all data [any [a: "^/" (b: copy/part a -1 if b <> {"} [remove
a insert a "|"]) | skip] to end] After my de-cluttering, I have a much cleaner string which is easy to work with:
>> probe data
== {"line1 item1","line1 item2","line 1 item3" line2, 'alleged'||item1 ,"None","line2 item3" } Or is it? From here, it seems obvious to use a simple 'parse split operation on the "^/". Not so fast. A simple parse split rule fails (e.g., parse/all data "^/"), and parse tries to seduce me into writing a set of grammar rules. I decline. I'm a busy man, parse. I follow the path of least resistance, first saving the data and then re-opening the file using 'read/lines:
>> write tmp: %data-temp.txt data
...
>> data: read/lines tmp
I'm not proud of it, but this gets me back on track. I have a block of strings, all neat-and-tidy for processing. For my quick-and-dirty script, I'm not overly concerned about writing and re-opening a temp file. But it would be nice if I could use read/lines or a similar function directly on the string in memory, without writing a complex parse rule. Is there a function which can perform this (a read/lines) on a string? Was I foolish to resist parse? Does anyone have better ideas for working with these types of data files? Thanks