[REBOL] Read/lines on a string!
From: edoconnor:g:mail at: 20-Sep-2007 11:57
Anyone know if there's a way to perform a read/lines on a string?
Yesterday I needed to process a 100 Kb CSV file and perform some
actions based on the data. In a simple world, I would be able to read
in the files like this:
data: read/lines %mycsv.csv
and then iterate over the data with a foreach or forall.
Unfortunately, some of the data fields in the CSV are empty, and some
contain delimiters as well as carriage-returns (all of which are valid
in a CSV). In this case using 'read/lines leads to a mess, because the
data looks like this:
>> data: read %mycsv.csv
== {"line1 item1","line1 item2","line 1 item3"
line2,
alleged
item1
,"","line2 item3"
}
This does not lend itself to a quick parse rule, so I punt and decide
to read the string and perform some data scrubbing:
>> data: read %mycsv.csv
...
>> replace/all data {,"",} {,"None",}
>> replace/all data {""} {'}
This cleans up the quotes delimiters. To clean up carriage returns, I
concatenate any lines which do not end in a double-quote, swapping the
^/ for a pipe within the field.
>> parse/all data [any [a: "^/" (b: copy/part a -1 if b <> {"} [remove
a insert a "|"]) | skip] to end]
After my de-cluttering, I have a much cleaner string which is easy to work with:
>> probe data
== {"line1 item1","line1 item2","line 1 item3"
line2, 'alleged'||item1
,"None","line2 item3"
}
Or is it? From here, it seems obvious to use a simple 'parse split
operation on the "^/". Not so fast. A simple parse split rule fails
(e.g., parse/all data "^/"), and parse tries to seduce me into writing
a set of grammar rules. I decline. I'm a busy man, parse.
I follow the path of least resistance, first saving the data and then
re-opening the file using 'read/lines:
>> write tmp: %data-temp.txt data
...
>> data: read/lines tmp
I'm not proud of it, but this gets me back on track. I have a block of
strings, all neat-and-tidy for processing.
For my quick-and-dirty script, I'm not overly concerned about writing
and re-opening a temp file. But it would be nice if I could use
read/lines or a similar function directly on the string in memory,
without writing a complex parse rule.
Is there a function which can perform this (a read/lines) on a string?
Was I foolish to resist parse? Does anyone have better ideas for
working with these types of data files?
Thanks