Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: Need help from parsing professionals

From: joel:neely:fedex at: 27-Jul-2001 2:02

Hi, Thorsten, I'm not a parsing professional (and I don't play one on TV ;-) but maybe I can make a couple of suggestions... Thorsten Moeller wrote:
> I have a big string. from this string i need to make a > readable file with lots of lines. So every line starts > with AA...... > The Line ends when the next Line with AA comes up. >
Here's one way, using the sample data from your post
>> foo
== {AA2012001070107120006600000300002DIV 1015940000000000000000000000000001106725000010711010711AR 000 BUCHUNG TS1000000001301,84...
>> recs: []
== []
>> parse foo [any [
[ "AA" copy rec to "AA" (append recs join "AA" rec) | [ "AA" copy rec to end (append recs join "AA" rec) [ ]] == true
>> length? recs
== 14
>> recs
== [{AA2012001070107120006600000300002DIV 1015940000000000000000000000000001106725000010711010711AR 000 BUCHUNG TS1000000001301,8... after which all of your "records" are individual strings in the block RECS. While I think that's the closest to your original request, I can't help adding a couple of others, just for fun. This version requires the least knowledge of PARSE, but doesn't show off its power either:
>> bletch: parse/all foo "A"
== ["" "" {2012001070107120006600000300002DIV 1015940000000000000000000000000001106725000010711010711} {R 000 BUCHUNG TS100000000... Discard the first element of the block (it's whatever precedes the first "A"). After that, every empty string represents a record boundary (the 0-length gap between the #"A"s in "AA"). Why would anyone write it this way? I had a file some time back where two-character sequences marked both record *and* field boundaries. This approach would allow you to process the individual fields within the records in order. Another option is
>> parse foo ["AA" copy rec to "AA" copy rest to end]
== true
>> rec
== {2012001070107120006600000300002DIV 1015940000000000000000000000000001106725000010711010711AR 000 BUCHUNG TS1000000001301,84+0...
>> rest
== {AA2012001070107120006600000200001SAB 0000081080000000000000000000000001106724000010711010711AR 000 GEGENB. AU2000000012419,49... which (as long as PARSE returns TRUE) lets you slurp the first record from the string, leaving the rest in ... REST. Why would anyone write it this way? You can put the above in a loop if you only need to process the first few records (e.g. find the first record with a specified set of criteria). HTH! -jn- -- ------------------------------------------------------------ Programming languages: compact, powerful, simple ... Pick any two! joel'dot'neely'at'fedex'dot'com