Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search

[REBOL] parse or Re:(8)

From: joel:neely:fedex at: 21-Sep-2000 15:20

Ooooh! Typo on my part. Apologies for any confusion it caused. Where I typed
> > > > The objective, as I read it, was to break on PARAGRAPHS (not lines) > > where a paragraph is defined as the end of a sentence that concides > > with the end of a line... > >
I intended to be saying (correction in all caps) The objective, as I read it, was to break INTO paragraphs (not lines) where THE END OF a paragraph is defined IN THE ORIGINAL MESSAGE as the end of a sentence that concides with the end of the line... I certainly agree that in "real text" the situation becomes much fuzzier and more contextual. For example, there's one school of thought in typography that insists that blank lines as paragraph separators are wrong; that one should use indentation only (without vertical whitespace) to indicate the start of a new paragraph. Of course, this requires that one keep up with the "normal" margins being used in the text, as well as using the multi-line context to distinguish indented-first-line-of-paragraph from indented-multiline-block-quote, etc... [rryost--home--com] wrote:
> Word wrapping in word processors and even email editors makes the definition > of PARAGRAPH given below by Joel questionable, IMHO. Thinking about it, I > recognize the start of a new paragraph in literature by the presence of a > blank or empty line. Thus the sequence of two "new line" control characters > would signal the start of a new PARAGRAPH. I think this is beyond the scope > of the simple-parse approach. The more complex rule based functional > approach developed in this thread by others is required. >
Not really. Consider
>> text: {this is some^/text that^/flows.^/^/more sentences appear.}
== {this is some text that flows. more sentences appear.}
>> replace/all text "^/^/" #"^(ff)"
== {this is some text that flows.˙more sentences appear.}
>> parse/all text to-string #"^(ff)"
== ["this is some^/text that^/flows." "more sentences appear."] So, using replace to find the empty lines (as consecutive newlines) allows us to crack the text with a simple parse. (I'm ducking the issue that there may be runs of more than two consecutive newlines. Figuring out how to remove those with a minimal number of replace statements is left as an exercise for the reader... ;-) -jn-