Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

Need help from parsing professionals

 [1/6] from: moeller_thorsten:gmx at: 27-Jul-2001 12:36


Hi, it is obviously not a big thing. I have a big string. from this string i need to make a readable file with lots of lines. So every line starts with AA...... The Line ends when the next Line with AA comes up. What i tried now was following: a: read %/c/temp/test.txt parse a [any [to "AA201" mark: to "AA" (write/append %/c/temp/new.txt join mark "^/")]] This seems to run in an endless loop and looking in the new.txt it shows, that the parser didn't recognise the "AA201", because he always adds the whole big string in one part, not the parts of the string as lines. Here are some sample lines for testing: AA2012001070107120006600000300002DIV 1015940000000000000000000000000001106725000010711010711AR 000 BUCHUNG TS1000000001301,84+00000000179,56-00000000000,00 001 016,0000000000000,00 DE118904627 00000000000,00+00000000000,00-00000000000,00 00000,000000 0000000000 EDV RE. KUNDE 000000000000,00 00000000000000000000000000000000000000000000000000000001400000,0000000,00000 000000000000000000000000001301,84+00000000000,00 00000000000,00 00000000000000000000 000000000000 000000 000000 0000000 0 AA2012001070107120006600000200001SAB 0000081080000000000000000000000001106724000010711010711AR 000 GEGENB. AU2000000012419,49-00000000000,00 00000000000,00 000 000,0000000000000,00 00000000000,00 00000000000,00 00000000000,00 000000000000 000000000 0 - 00000000000000000000000000000000000000000000000000000000000000,0000000,00000 000000000000000000000000012419,49-00000000000,00 00000000000,00 00000000000000000000 000000000000 000000 000000 0000000 0 AA2012001070107060006520013100056SAB 0000082010000000000000000000000002113237000010705010705AR 000 GEGENB. AU2000000000305,11-00000000000,00 00000000000,00 000 000,0000000000000,00 00000000000,00 00000000000,00 00000000000,00 000000000000 000000000 0 - 00000000000000000000000000000000000000000000000000000000000000,0000000,00000 000000000000000000000000000305,11-00000000000,00 00000000000,00 00000000000000000000 000000000000 000000 000000 0000000 0 AA2012001070107060006520013000056SAB 0000081010000000000000000000000002113237000010705010705AR 000 GEGENB. AU2000000000012,20-00000000000,00 00000000000,00 000 000,0000000000000,00 00000000000,00 00000000000,00 00000000000,00 000000000000 000000000 0 - 00000000000000000000000000000000000000000000000000000000000000,0000000,00000 000000000000000000000000000012,20-00000000000,00 00000000000,00 00000000000000000000 000000000000 000000 000000 0000000 0 AA2012001070107060006520012900056DIV 1011580000000000000000000000000002113237000010705010705AR 000 BUCHUNG TS1000000000368,08+00000000050,77-00000000000,00 001 016,0000000000000,00 DE118904627 00000000000,00+00000000000,00-00000000000,00 00000,000000 0000000000 EDV RE. KUNDE 000000000000,00 00000000000000000000000000000000000000000000000000000001400000,0000000,00000 000000000000000000000000000368,08+00000000000,00 00000000000,00 00000000000000000000 000000000000 000000 000000 0000000 0 AA2012001070107060006520012800055SAB 0000082010000000000000000000000002113236000010705010705AR 000 GEGENB. AU2000000001067,89-00000000000,00 00000000000,00 000 000,0000000000000,00 00000000000,00 00000000000,00 00000000000,00 000000000000 000000000 0 - 00000000000000000000000000000000000000000000000000000000000000,0000000,00000 000000000000000000000000001067,89-00000000000,00 00000000000,00 00000000000000000000 000000000000 000000 000000 0000000 0 AA2012001070107070006540034500151SAB 0000082010000000000000000000000002113489000010706010706AR 000 GEGENB. AU2000000000043,57-00000000000,00 00000000000,00 000 000,0000000000000,00 00000000000,00 00000000000,00 00000000000,00 000000000000 000000000 0 - 00000000000000000000000000000000000000000000000000000000000000,0000000,00000 000000000000000000000000000043,57-00000000000,00 00000000000,00 00000000000000000000 000000000000 000000 000000 0000000 0 AA2012001070107030006460001200006SAB 0000082030000000000000000000000002112927000010702010702AR 000 GEGENB. AU2000000006166,19-00000000000,00 00000000000,00 000 000,0000000000000,00 00000000000,00 00000000000,00 00000000000,00 000000000000 000000000 0 - 00000000000000000000000000000000000000000000000000000000000000,0000000,00000 000000000000000000000000006166,19-00000000000,00 00000000000,00 00000000000000000000 000000000000 000000 000000 0000000 0 AA2012001070107130006620002000010SAB 0000081010000000000000000000000001106737000010712010712AR 000 GEGENB. AU2000000029139,56-00000000000,00 00000000000,00 000 000,0000000000000,00 00000000000,00 00000000000,00 00000000000,00 000000000000 000000000 0 - 00000000000000000000000000000000000000000000000000000000000000,0000000,00000 000000000000000000000000029139,56-00000000000,00 00000000000,00 00000000000000000000 000000000000 000000 000000 0000000 0 AA2012001070107050006500005800028SAB 0000081010000000000000000000000002113149000010704010704AR 000 GEGENB. AU2000000000012,20-00000000000,00 00000000000,00 000 000,0000000000000,00 00000000000,00 00000000000,00 00000000000,00 000000000000 000000000 0 - 00000000000000000000000000000000000000000000000000000000000000,0000000,00000 000000000000000000000000000012,20-00000000000,00 00000000000,00 00000000000000000000 000000000000 000000 000000 0000000 0 AA2012001070107070006540013500064SAB 0000082010000000000000000000000002113399000010706010706AR 000 GEGENB. AU2000000001645,97-00000000000,00 00000000000,00 000 000,0000000000000,00 00000000000,00 00000000000,00 00000000000,00 000000000000 000000000 0 - 00000000000000000000000000000000000000000000000000000000000000,0000000,00000 000000000000000000000000001645,97-00000000000,00 00000000000,00 00000000000000000000 000000000000 000000 000000 0000000 0 AA2012001070107070006540013400064DIV 1032006600000000000000000000000002113399000010706010706AR 000 BUCHUNG TS1000000001909,33+00000000263,36-00000000000,00 001 016,0000000000000,00 DE118904627 00000000000,00+00000000000,00-00000000000,00 00000,000000 0000000000 EDV RE. KUNDE 000000000000,00 00000000000000000000000000000000000000000000000000000000100000,0000000,00000 000000000000000000000000001909,33+00000000000,00 00000000000,00 00000000000000000000 000000000000 000000 000000 0000000 0 AA2012001070107070006540013300063SAB 0000082010000000000000000000000002113398000010706010706AR 000 GEGENB. AU2000000000250,88-00000000000,00 00000000000,00 000 000,0000000000000,00 00000000000,00 00000000000,00 00000000000,00 000000000000 000000000 0 - 00000000000000000000000000000000000000000000000000000000000000,0000000,00000 000000000000000000000000000250,88-00000000000,00 00000000000,00 00000000000000000000 000000000000 000000 000000 0000000 0 AA2012001070107060006520034400147DIV 1042320800000000000000000000000002113329000010705010705AR 000 BUCHUNG TS1000000000028,35+00000000003,91-00000000000,00 001 016,0000000000000,00 DE118904627 00000000000,00+00000000000,00-00000000000,00 00000,000000 0000000000 EDV RE. KUNDE 000000000000,00 00000000000000000000000000000000000000000000000000000001000000,0000000,00000 000000000000000000000000000000,00+00000000000,00 00000000000,00 00000000000000000000 000000000000 000000 000000 0000000 0 Any ideas??? Thorsten

 [2/6] from: sqlab:gmx at: 27-Jul-2001 12:48


Hi Thorsten
> Hi, > it is obviously not a big thing.
<<quoted lines omitted: 8>>
> that the parser didn't recognise the "AA201", because he always adds the > whole big string in one part, not the parts of the string as lines.
At first you should advance the focus as in to "AA201" mark: skip 2 to "AA" mark sets only a pointer in the string, so your next write/append appends always from this point until the end of the string. Of course you could use a copy to or a copy/part with a second pointer. But why do you not just replace/all next a "AA201" "^/AA201" ? AR AR -- GMXler aufgepasst - jetzt viele 1&1 New WebHosting Pakete ohne Einrichtungsgebuehr + 1 Monat Grundgebuehrbefreiung! http://puretec.de/index.html?ac=OM.PU.PU003K00717T0492a

 [3/6] from: chris::starforge::demon::co::uk at: 27-Jul-2001 12:58


Thorsten Moeller wrote:
> Any ideas???
Just out of interest ('cos I'm a nosey person) what was that? Looked amost like protien sequence informtion or something... Chris the Wildly Inaccurate -- .------{ http://www.starforge.co.uk }-----. .--------------------------. =[ Explorer2260, Designer and Coder \=\ P: TexMaker, ROACH, site \ =[___You_will_obey_your_corporate_masters___]==[ Stack: EETmTmTRRSS------ ]

 [4/6] from: joel:neely:fedex at: 27-Jul-2001 2:02


Hi, Thorsten, I'm not a parsing professional (and I don't play one on TV ;-) but maybe I can make a couple of suggestions... Thorsten Moeller wrote:
> I have a big string. from this string i need to make a > readable file with lots of lines. So every line starts > with AA...... > The Line ends when the next Line with AA comes up. >
Here's one way, using the sample data from your post
>> foo
== {AA2012001070107120006600000300002DIV 1015940000000000000000000000000001106725000010711010711AR 000 BUCHUNG TS1000000001301,84...
>> recs: []
== []
>> parse foo [any [
[ "AA" copy rec to "AA" (append recs join "AA" rec) | [ "AA" copy rec to end (append recs join "AA" rec) [ ]] == true
>> length? recs
== 14
>> recs
== [{AA2012001070107120006600000300002DIV 1015940000000000000000000000000001106725000010711010711AR 000 BUCHUNG TS1000000001301,8... after which all of your "records" are individual strings in the block RECS. While I think that's the closest to your original request, I can't help adding a couple of others, just for fun. This version requires the least knowledge of PARSE, but doesn't show off its power either:
>> bletch: parse/all foo "A"
== ["" "" {2012001070107120006600000300002DIV 1015940000000000000000000000000001106725000010711010711} {R 000 BUCHUNG TS100000000... Discard the first element of the block (it's whatever precedes the first "A"). After that, every empty string represents a record boundary (the 0-length gap between the #"A"s in "AA"). Why would anyone write it this way? I had a file some time back where two-character sequences marked both record *and* field boundaries. This approach would allow you to process the individual fields within the records in order. Another option is
>> parse foo ["AA" copy rec to "AA" copy rest to end]
== true
>> rec
== {2012001070107120006600000300002DIV 1015940000000000000000000000000001106725000010711010711AR 000 BUCHUNG TS1000000001301,84+0...
>> rest
== {AA2012001070107120006600000200001SAB 0000081080000000000000000000000001106724000010711010711AR 000 GEGENB. AU2000000012419,49... which (as long as PARSE returns TRUE) lets you slurp the first record from the string, leaving the rest in ... REST. Why would anyone write it this way? You can put the above in a loop if you only need to process the first few records (e.g. find the first record with a specified set of criteria). HTH! -jn- -- ------------------------------------------------------------ Programming languages: compact, powerful, simple ... Pick any two! joel'dot'neely'at'fedex'dot'com

 [5/6] from: sqlab:gmx at: 27-Jul-2001 14:30


> Hi Thorsten > > Hi,
<<quoted lines omitted: 20>>
> At first you should advance the focus as in > to "AA201" mark: skip 2 to "AA"
As I misplaced the parameter to skip, I will deliver a correct version b: a: read %/c/temp/test.txt parse next a [ some [to "AA201" a: ( write/append/lines %/c/temp/new.txt copy/part b a ) b: 4 skip ] ] AR
> mark sets only a pointer in the string, so your next write/append appends > always from this point until the end of the string. > > Of course you could use a copy to or a copy/part with a second pointer. > But why do you not just replace/all next a "AA201" "^/AA201" ? >
-- GMXler aufgepasst - jetzt viele 1&1 New WebHosting Pakete ohne Einrichtungsgebuehr + 1 Monat Grundgebuehrbefreiung! http://puretec.de/index.html?ac=OM.PU.PU003K00717T0492a

 [6/6] from: moeller_thorsten:gmx at: 30-Jul-2001 9:20


Hi to all the helpers out there, thanks for all the suggestions. They helped me out and encouraged me to go further with REBOL. Thanks Thorsten

Notes
  • Quoted lines have been omitted from some messages.
    View the message alone to see the lines that have been omitted