Mailing List Archive: Re: Parse versus Regular Expressions

[REBOL] Re: Parse versus Regular Expressions

From: brett:codeconscious at: 5-Apr-2003 13:50


Hi Joel,

> I'm quite willing to be educated, so here's another example.  I'll
> state the problem, then wait for suggestions on how to use PARSE
> before I show the solution (using Python's RE engine).  Lest I be
> accused of theory, I'll point out that this is a disguised and
> simplified version of a program I've recently written for Real Work.

Well I had a go. I decided to just attack it using the first approach that
came to mind.

My prelinary thinking was that there is two themes here, control
(sequencing, errors) and character stream (accumulating sentences). So the
plan was (1) break into lines (2) break into fields (3) filter (4) process
the character stream. That is basically what I've carried into PARSE rules.
To accomplish this I've had to make Parse jump around a bit by changing the
input pointer a few times and by rewriting the rules as things proceed.

If this was for real work, then:
    EITHER one-off? [
        use it after more testing
    ][ rewrite it with a more robust solution]

Regards,
Brett.

REBOL [
    Title: "Joel's PARSE problem."
    Author: "Brett Handley"
    Comments: {
1. First  "blunt parse attack" approach.
2. I don't like the way I am stepping through character data.
3. More of a concern, in this approach it is difficult to see
   any "value add" from PARSE over just writing a non-parse
   parser.
4. I posted it because despite being ugly, I've expended the
   effort, so may as well offer it as a target of critique! :)
5. Way to sensitive to a change in character positions!
6. Not tested thoroughly - only on the data in the code.
}
]

data: decompress #{
789CA5D25D0AC3200C00E0ABE46DD087E24FF7738FED02B6CD9820DA55A5D75F
EA2C8C958DD9459F8C7C2644C628385C6EDA036D051E6D40DB610D45211893B4
F8CC31910FCFAE0C597B2279F2254145FEED356FC940DD6FEBF7E9EDAB44C0A4
8D815E7B350CA8C62D9E4CDEA18236866C3ABB0BD062A7A24770571AD03DCE03
0237F6F8F191E4F13C8F23CD17C1281FCA9A5CD7D724EFF4FDEE2F5F68F11EAA
35717B80020000
}

; -------------------------------------------------
; PARSE RULES
; -------------------------------------------------

r_digit: charset {0123456789}

r_char: complement charset { ^-}

r_seq1: [
    copy v_lineseq1 6 r_digit
    (v_lineseq1: to integer! v_lineseq1)
]

r_seq2: [
    copy v_lineseq2 8 r_digit
    (v_lineseq2: to integer! v_lineseq2)
]

r_body: [
    :v_linebody
    66 [
        #"." (end-of-sentence) |
        copy v_char r_char (character v_char) |
        skip (whitespace)
    ]
    8 skip
]
r_ignore: [none]
r_error: [
    (emit join "ERROR: " copy/part v_linestart v_lineend)
    none
]

; Splits each line into fields, checks sequencing.
r_line: [
    v_linestart:
    r_seq1 v_linebody: 66 skip r_seq2 v_lineend:
    (r_filterdynamic: dynamic-filter-rule?)
    r_filterdynamic
]

; -------------------------------------------------
; Functions call during parse process
; -------------------------------------------------

dynamic-filter-rule?: does [
    either all [
        any [none? seq1 v_lineseq1 >= seq1]
        any [none? seq2 v_lineseq2 >= seq2]
    ] [
        seq1: v_lineseq1
        seq2: v_lineseq2
        either v_linebody/1 = #"*" [[none]] [r_body]
    ] [r_error]
]

character: func [ch] [insert tail out ch]

whitespace: does [
    if all [not empty? out #" " <> last out] [
        insert tail out #" "
    ]
]

end-of-sentence: does [
    character #"."
    emit out out: copy {}
]

emit: :print

; -------------------------------------------------
; HERE WE GO
; -------------------------------------------------

;
; Initialise
;

v_lineseq1: v_lineseq2:
v_linebody: v_linestart: v_lineend:
seq1: seq2: r_filterdynamic: none
out: copy {}

either parse/all data [any r_line] [
    if not empty? out [print "ERROR last sentence incomplete."]
] [print "ERROR The file does not conform to the expected format."]

HALT