Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

Read/lines on a string!

 [1/9] from: edoconnor:g:mail at: 20-Sep-2007 11:57


Anyone know if there's a way to perform a read/lines on a string? Yesterday I needed to process a 100 Kb CSV file and perform some actions based on the data. In a simple world, I would be able to read in the files like this: data: read/lines %mycsv.csv and then iterate over the data with a foreach or forall. Unfortunately, some of the data fields in the CSV are empty, and some contain delimiters as well as carriage-returns (all of which are valid in a CSV). In this case using 'read/lines leads to a mess, because the data looks like this:
>> data: read %mycsv.csv
== {"line1 item1","line1 item2","line 1 item3" line2, alleged item1 ,"","line2 item3" } This does not lend itself to a quick parse rule, so I punt and decide to read the string and perform some data scrubbing:
>> data: read %mycsv.csv
...
>> replace/all data {,"",} {,"None",} >> replace/all data {""} {'}
This cleans up the quotes delimiters. To clean up carriage returns, I concatenate any lines which do not end in a double-quote, swapping the ^/ for a pipe within the field.
>> parse/all data [any [a: "^/" (b: copy/part a -1 if b <> {"} [remove
a insert a "|"]) | skip] to end] After my de-cluttering, I have a much cleaner string which is easy to work with:
>> probe data
== {"line1 item1","line1 item2","line 1 item3" line2, 'alleged'||item1 ,"None","line2 item3" } Or is it? From here, it seems obvious to use a simple 'parse split operation on the "^/". Not so fast. A simple parse split rule fails (e.g., parse/all data "^/"), and parse tries to seduce me into writing a set of grammar rules. I decline. I'm a busy man, parse. I follow the path of least resistance, first saving the data and then re-opening the file using 'read/lines:
>> write tmp: %data-temp.txt data
...
>> data: read/lines tmp
I'm not proud of it, but this gets me back on track. I have a block of strings, all neat-and-tidy for processing. For my quick-and-dirty script, I'm not overly concerned about writing and re-opening a temp file. But it would be nice if I could use read/lines or a similar function directly on the string in memory, without writing a complex parse rule. Is there a function which can perform this (a read/lines) on a string? Was I foolish to resist parse? Does anyone have better ideas for working with these types of data files? Thanks

 [2/9] from: gregg:pointillistic at: 20-Sep-2007 10:12


Hi Ed, EOC> Is there a function which can perform this (a read/lines) on a string? EOC> Was I foolish to resist parse? Does anyone have better ideas for EOC> working with these types of data files? I wouldn't worry too much about the temp file approach, but if you run into data that falls outside the bounds of what REBOL's "simple" approach will handle, I wouldn't resist PARSE either. Sometimes it can save you a lot of time for things like this, if you know what the rules are of course. :) -- Gregg

 [3/9] from: anton:wilddsl:au at: 21-Sep-2007 2:18


Hi Ed, It's not such a difficult parse, I think, for your final step. Try this:
>> parse/all data [any [copy line to "^/" skip (?? line)] copy line to end
(?? line)] line: {"line1 item1","line1 item2","line 1 item3"} line: {"line2, 'alleged'||item1","None","line2 item3"} line: none == true Just put the lines in a block. By the way, remove a insert a "|" should be faster as: change a "|" Anton.

 [4/9] from: ale870::gmail::com at: 20-Sep-2007 22:20


What is " ?? line " ? On 9/20/07, Anton Rolls <anton-wilddsl.net.au> wrote:
> Hi Ed, > It's not such a difficult parse, I think,
<<quoted lines omitted: 47>>
> To unsubscribe from the list, just send an email to > lists at rebol.com with unsubscribe as the subject.
-- //Alessandro http://sguish.wordpress.com http://laccio.wordpress.com

 [5/9] from: edoconnor:gmai:l at: 20-Sep-2007 16:22


Thanks for the parse illumination. I think because my lazy brain was looking for a quick & simple read/lines, the multi-step parse rule seemed daunting. Besides all this fun stuff, does anyone else think a read/lines type function on a string would be useful? Best, Ed On 9/20/07, Anton Rolls wrote:

 [6/9] from: anton:wilddsl:au at: 21-Sep-2007 17:42


Alessandro, ?? is like PROBE, except it also prints out the word (as a set-word) before printing and returning the word's value. So try this in the console: ?? ?? It's like a quick alternative to SOURCE. Very handy in debugging. Anton.

 [7/9] from: ale870::gmail::com at: 21-Sep-2007 9:56


Wow! I love Rebol, since it seems like walking in a big city: behind every corner you will find more shops, more ways, other people, etc... You can walk in a big city for years by discovering something more every day ;-) Thank you! On 9/21/07, Anton Rolls <anton-wilddsl.net.au> wrote:
> Alessandro, > ?? is like PROBE, except it also prints out the word
<<quoted lines omitted: 9>>
> To unsubscribe from the list, just send an email to > lists at rebol.com with unsubscribe as the subject.
-- //Alessandro http://sguish.wordpress.com http://laccio.wordpress.com

 [8/9] from: moliad::gmail::com at: 21-Sep-2007 14:59


hehe, yes... and sometimes you end up in dead-ends, unfinished road work, and circular one-ways ;-) -MAx On 9/21/07, Alessandro Manotti <ale870-gmail.com> wrote:

 [9/9] from: ale870::gmail::com at: 24-Sep-2007 7:51


:-) :-) :-) On 9/21/07, Maxim Olivier-Adlhoch <moliad-gmail.com> wrote:
> hehe, > yes... and sometimes you end up in dead-ends, unfinished road work, and
<<quoted lines omitted: 57>>
> To unsubscribe from the list, just send an email to > lists at rebol.com with unsubscribe as the subject.
-- //Alessandro http://sguish.wordpress.com http://laccio.wordpress.com

Notes
  • Quoted lines have been omitted from some messages.
    View the message alone to see the lines that have been omitted