[REBOL] Re: Need help from parsing professionals
From: joel:neely:fedex at: 27-Jul-2001 2:02
Hi, Thorsten,
I'm not a parsing professional (and I don't play one on
TV ;-) but maybe I can make a couple of suggestions...
Thorsten Moeller wrote:
> I have a big string. from this string i need to make a
> readable file with lots of lines. So every line starts
> with AA......
> The Line ends when the next Line with AA comes up.
>
Here's one way, using the sample data from your post
>> foo
== {AA2012001070107120006600000300002DIV
1015940000000000000000000000000001106725000010711010711AR 000
BUCHUNG
TS1000000001301,84...
>> recs: []
== []
>> parse foo [any [
[ "AA" copy rec to "AA" (append recs join "AA" rec) |
[ "AA" copy rec to end (append recs join "AA" rec)
[ ]]
== true
>> length? recs
== 14
>> recs
== [{AA2012001070107120006600000300002DIV
1015940000000000000000000000000001106725000010711010711AR 000
BUCHUNG
TS1000000001301,8...
after which all of your "records" are individual strings in the
block RECS.
While I think that's the closest to your original request, I
can't help adding a couple of others, just for fun.
This version requires the least knowledge of PARSE, but doesn't
show off its power either:
>> bletch: parse/all foo "A"
== ["" "" {2012001070107120006600000300002DIV
1015940000000000000000000000000001106725000010711010711}
{R 000 BUCHUNG
TS100000000...
Discard the first element of the block (it's whatever precedes
the first "A"). After that, every empty string represents a
record boundary (the 0-length gap between the #"A"s in "AA").
Why would anyone write it this way? I had a file some time
back where two-character sequences marked both record *and*
field boundaries. This approach would allow you to process
the individual fields within the records in order.
Another option is
>> parse foo ["AA" copy rec to "AA" copy rest to end]
== true
>> rec
== {2012001070107120006600000300002DIV
1015940000000000000000000000000001106725000010711010711AR 000
BUCHUNG
TS1000000001301,84+0...
>> rest
== {AA2012001070107120006600000200001SAB
0000081080000000000000000000000001106724000010711010711AR 000
GEGENB.
AU2000000012419,49...
which (as long as PARSE returns TRUE) lets you slurp the first
record from the string, leaving the rest in ... REST.
Why would anyone write it this way? You can put the above in
a loop if you only need to process the first few records (e.g.
find the first record with a specified set of criteria).
HTH!
-jn-
--
------------------------------------------------------------
Programming languages: compact, powerful, simple ...
Pick any two!
joel'dot'neely'at'fedex'dot'com