[REBOL] parse or Re:(8)
From: joel:neely:fedex at: 21-Sep-2000 15:20
Ooooh! Typo on my part. Apologies for any confusion it caused. Where
I typed
> >
> > The objective, as I read it, was to break on PARAGRAPHS (not lines)
> > where a paragraph is defined as the end of a sentence that concides
> > with the end of a line...
> >
I intended to be saying (correction in all caps)
The objective, as I read it, was to break INTO paragraphs (not
lines)
where THE END OF a paragraph is defined IN THE ORIGINAL MESSAGE as
the end of a sentence that concides with the end of the line...
I certainly agree that in "real text" the situation becomes much fuzzier
and more contextual.
For example, there's one school of thought in typography that insists
that blank lines as paragraph separators are wrong; that one should
use indentation only (without vertical whitespace) to indicate the
start of a new paragraph. Of course, this requires that one keep up
with the "normal" margins being used in the text, as well as using the
multi-line context to distinguish indented-first-line-of-paragraph
from indented-multiline-block-quote, etc...
[rryost--home--com] wrote:
> Word wrapping in word processors and even email editors makes the definition
> of PARAGRAPH given below by Joel questionable, IMHO. Thinking about it, I
> recognize the start of a new paragraph in literature by the presence of a
> blank or empty line. Thus the sequence of two "new line" control characters
> would signal the start of a new PARAGRAPH. I think this is beyond the scope
> of the simple-parse approach. The more complex rule based functional
> approach developed in this thread by others is required.
>
Not really. Consider
>> text: {this is some^/text that^/flows.^/^/more sentences appear.}
== {this is some
text that
flows.
more sentences appear.}
>> replace/all text "^/^/" #"^(ff)"
== {this is some
text that
flows.˙more sentences appear.}
>> parse/all text to-string #"^(ff)"
== ["this is some^/text that^/flows." "more sentences appear."]
So, using replace to find the empty lines (as consecutive newlines)
allows us to crack the text with a simple parse. (I'm ducking the
issue that there may be runs of more than two consecutive newlines.
Figuring out how to remove those with a minimal number of replace
statements is left as an exercise for the reader... ;-)
-jn-