Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search

[REBOL] parse or Re:(5)

From: rchristiansen:pop:isdfa:sei-it at: 20-Sep-2000 18:11

> I assumed that > this is NOT what you wanted, but rather you wanted to copy through > either {.^/} or {."^} WHICHEVER COMES NEXT. (Natural language > text munching is a real pain, speaking from personal experience! ;-)
Yes, this is what I was looking for. As someone who has never parsed anything before using REBOL (there will be more like me!) the parsing rules are confusing to read in the REBOL docs. My inclination is to want to use a simple statement which will parse until a set of characters is reached OR a different set of characters is reached, whichever comes along first and next.
> The strategies I've thought of (I don't have time to code, compare, > and recommend right at the moment) are: > > 1) Write more complicated parse rules, that either > 1a) parse to newline, append the copied chunk to a paragraph > string under construction, then look at the tail end of > the last chunk to see whether it can be extended or whether > a new paragraph should be started (based on whether it > looked like the end of a sentence). > 1b) parse to period, grab and append the next character if it > is a quotation mark, append to paragraph under construction, and > start a new paragraph if the next character is newline. > 2) Use simpler parsing (break on newlines), then make a postpass > across the block of "lines", gluing back together wherever the > boundary isn't the end of a sentence.
You missed another option, which I had been using previously. Here is the function: breakdown-content: func [ "breakdown an e-mail content field into its parts" msg [object!] "e-mail message" ][ article-info: msg/content end-of-paragraph: rejoin [{.} newline] replace/all article-info end-of-paragraph {.~} content-parts: copy [] foreach part parse/all article-info {~} [ append content-parts trim/lines part ] ] In other words, replace all instances of a set of characters with a new character that can be recognized later. The above example needs to be fixed because it only replaces instances of {.^/} with "~" and I've discovered the tilde is a bad choice, anyway. I need to also be able to replace any set of characters you might find at the end of a paragraph, including {."^/} and {!^/} and {?^/} and {:^/} and {...^/} and I'm sure there are more. I was hoping there would be a quick way to use parse instead of replacing characters first and then parsing. -Ryan