[REBOL] Re: Parse versus Regular Expressions
From: joel:neely:fedex at: 4-Apr-2003 17:51
Hi, Ladislav,
As usual, you've provided much food for thought.
Ladislav Mecir wrote:
> Yes, PARSE dialect looks procedural. Nevertheless, all the
> "language stuff" is a hybrid between procedural and declarative
> descriptions: e.g. Regular Expressions and FSM's describe the
> same - Regular Languages.
>
> Similarly Grammars (more declarative) and Turing Machines
> (procedural) describe the same languages.
>
...
> If you really want a symmetrical OR parse rule, it can be programmed:
>
I'm quite willing to be educated, so here's another example. I'll
state the problem, then wait for suggestions on how to use PARSE
before I show the solution (using Python's RE engine). Lest I be
accused of theory, I'll point out that this is a disguised and
simplified version of a program I've recently written for Real Work.
GIVEN:
A file of lines, each of which is 80 characters, and contains:
1) a six-digit leading sequence number,
2) a 66-character body area,
3) an eight-digit trailing sequence number.
The body area contains sentences, which end in a period followed
by whitespace. A sentence may spread across the body areas of
one or more lines, but if a sentence ends on one line, the rest
of that body will be blank and the next sentence will begin
in a subsequent line.
If the body area begins with an asterisk, it is to be ignored.
Consecutive lines should have both leading and trailing sequence
numbers that are in order.
OUTPUT:
If any line has an out-of-order leading or trailing sequence
number, echo that line to output as an error.
Output whole sentences (with redundant whitespace removed) as
individual lines of output.
To illustrate (although my lines here aren't 80 bytes, to avoid
email line wrap and save typing):
00001 This is a sentence. 20030301
00002 So 20030302
00003 is 20030302
00004 this. 20030302
00005* this will disappear 20030303
00006* but this won't because of sequence order 20030101
00007 The last 20030304
00008 sentence. 20030304
should get output like this
This is a sentence.
So is this.
ERROR: 000006* but this won't because of sequenc...
The last sentence.
Thanks in advance to anyone who offers PARSE solutions!
-jn-
--
----------------------------------------------------------------------
Joel Neely joelDOTneelyATfedexDOTcom 901-263-4446
Counting lines of code is to software development as
counting bricks is to urban development.