Read/lines on a string!
[1/9] from: edoconnor::gmail at: 20-Sep-2007 11:57
Anyone know if there's a way to perform a read/lines on a string?
Yesterday I needed to process a 100 Kb CSV file and perform some
actions based on the data. In a simple world, I would be able to read
in the files like this:
data: read/lines %mycsv.csv
and then iterate over the data with a foreach or forall.
Unfortunately, some of the data fields in the CSV are empty, and some
contain delimiters as well as carriage-returns (all of which are valid
in a CSV). In this case using 'read/lines leads to a mess, because the
data looks like this:
>> data: read %mycsv.csv
== {"line1 item1","line1 item2","line 1 item3"
line2,
alleged
item1
,"","line2 item3"
}
This does not lend itself to a quick parse rule, so I punt and decide
to read the string and perform some data scrubbing:
>> data: read %mycsv.csv
...
>> replace/all data {,"",} {,"None",}
>> replace/all data {""} {'}
This cleans up the quotes delimiters. To clean up carriage returns, I
concatenate any lines which do not end in a double-quote, swapping the
^/ for a pipe within the field.
>> parse/all data [any [a: "^/" (b: copy/part a -1 if b <> {"} [remove
a insert a "|"]) | skip] to end]
After my de-cluttering, I have a much cleaner string which is easy to work with:
>> probe data
== {"line1 item1","line1 item2","line 1 item3"
line2, 'alleged'||item1
,"None","line2 item3"
}
Or is it? From here, it seems obvious to use a simple 'parse split
operation on the "^/". Not so fast. A simple parse split rule fails
(e.g., parse/all data "^/"), and parse tries to seduce me into writing
a set of grammar rules. I decline. I'm a busy man, parse.
I follow the path of least resistance, first saving the data and then
re-opening the file using 'read/lines:
>> write tmp: %data-temp.txt data
...
>> data: read/lines tmp
I'm not proud of it, but this gets me back on track. I have a block of
strings, all neat-and-tidy for processing.
For my quick-and-dirty script, I'm not overly concerned about writing
and re-opening a temp file. But it would be nice if I could use
read/lines or a similar function directly on the string in memory,
without writing a complex parse rule.
Is there a function which can perform this (a read/lines) on a string?
Was I foolish to resist parse? Does anyone have better ideas for
working with these types of data files?
Thanks
[2/9] from: gregg:pointillistic at: 20-Sep-2007 10:12
Hi Ed,
EOC> Is there a function which can perform this (a read/lines) on a string?
EOC> Was I foolish to resist parse? Does anyone have better ideas for
EOC> working with these types of data files?
I wouldn't worry too much about the temp file approach, but if you run
into data that falls outside the bounds of what REBOL's "simple"
approach will handle, I wouldn't resist PARSE either. Sometimes it can
save you a lot of time for things like this, if you know what the
rules are of course. :)
-- Gregg
[3/9] from: anton:wilddsl:au at: 21-Sep-2007 2:18
Hi Ed,
It's not such a difficult parse, I think,
for your final step. Try this:
>> parse/all data [any [copy line to "^/" skip (?? line)] copy line to end
(?? line)]
line: {"line1 item1","line1 item2","line 1 item3"}
line: {"line2, 'alleged'||item1","None","line2 item3"}
line: none
== true
Just put the lines in a block.
By the way,
remove a insert a "|"
should be faster as:
change a "|"
Anton.
[4/9] from: ale870::gmail::com at: 20-Sep-2007 22:20
What is " ?? line " ?
On 9/20/07, Anton Rolls <anton-wilddsl.net.au> wrote:
> Hi Ed,
> It's not such a difficult parse, I think,
<<quoted lines omitted: 47>>
> To unsubscribe from the list, just send an email to
> lists at rebol.com with unsubscribe as the subject.
--
//Alessandro
http://sguish.wordpress.com
http://laccio.wordpress.com
[5/9] from: edoconnor:g:mail at: 20-Sep-2007 16:22
Thanks for the parse illumination. I think because my lazy brain was
looking for a quick & simple read/lines, the multi-step parse rule
seemed daunting.
Besides all this fun stuff, does anyone else think a read/lines type
function on a string would be useful?
Best,
Ed
On 9/20/07, Anton Rolls wrote:
[6/9] from: anton:wilddsl:au at: 21-Sep-2007 17:42
Alessandro,
?? is like PROBE, except it also prints out the word
(as a set-word) before printing and returning the word's value.
So try this in the console:
?? ??
It's like a quick alternative to SOURCE.
Very handy in debugging.
Anton.
[7/9] from: ale870::gmail::com at: 21-Sep-2007 9:56
Wow! I love Rebol, since it seems like walking in a big city: behind every
corner you will find more shops, more ways, other people, etc... You can
walk in a big city for years by discovering something more every day ;-)
Thank you!
On 9/21/07, Anton Rolls <anton-wilddsl.net.au> wrote:
> Alessandro,
> ?? is like PROBE, except it also prints out the word
<<quoted lines omitted: 9>>
> To unsubscribe from the list, just send an email to
> lists at rebol.com with unsubscribe as the subject.
--
//Alessandro
http://sguish.wordpress.com
http://laccio.wordpress.com
[8/9] from: moliad::gmail::com at: 21-Sep-2007 14:59
hehe,
yes... and sometimes you end up in dead-ends, unfinished road work, and
circular one-ways
;-)
-MAx
On 9/21/07, Alessandro Manotti <ale870-gmail.com> wrote:
[9/9] from: ale870::gmail::com at: 24-Sep-2007 7:51
:-) :-) :-)
On 9/21/07, Maxim Olivier-Adlhoch <moliad-gmail.com> wrote:
> hehe,
> yes... and sometimes you end up in dead-ends, unfinished road work, and
<<quoted lines omitted: 57>>
> To unsubscribe from the list, just send an email to
> lists at rebol.com with unsubscribe as the subject.
--
//Alessandro
http://sguish.wordpress.com
http://laccio.wordpress.com
Notes
- Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted