Need help from parsing professionals
[1/6] from: moeller_thorsten:gmx at: 27-Jul-2001 12:36
Hi,
it is obviously not a big thing.
I have a big string. from this string i need to make a readable file with
lots of lines. So every line starts with AA......
The Line ends when the next Line with AA comes up.
What i tried now was following:
a: read %/c/temp/test.txt
parse a [any [to "AA201" mark: to "AA" (write/append %/c/temp/new.txt join
mark "^/")]]
This seems to run in an endless loop and looking in the new.txt it shows,
that the parser didn't recognise the "AA201", because he always adds the
whole big string in one part, not the parts of the string as lines.
Here are some sample lines for testing:
AA2012001070107120006600000300002DIV
1015940000000000000000000000000001106725000010711010711AR 000 BUCHUNG
TS1000000001301,84+00000000179,56-00000000000,00 001
016,0000000000000,00 DE118904627
00000000000,00+00000000000,00-00000000000,00 00000,000000 0000000000
EDV RE. KUNDE 000000000000,00
00000000000000000000000000000000000000000000000000000001400000,0000000,00000
000000000000000000000000001301,84+00000000000,00 00000000000,00
00000000000000000000 000000000000 000000
000000 0000000 0 AA2012001070107120006600000200001SAB
0000081080000000000000000000000001106724000010711010711AR 000 GEGENB.
AU2000000012419,49-00000000000,00 00000000000,00 000
000,0000000000000,00 00000000000,00
00000000000,00 00000000000,00 000000000000 000000000
0 -
00000000000000000000000000000000000000000000000000000000000000,0000000,00000
000000000000000000000000012419,49-00000000000,00 00000000000,00
00000000000000000000 000000000000 000000
000000 0000000 0 AA2012001070107060006520013100056SAB
0000082010000000000000000000000002113237000010705010705AR 000 GEGENB.
AU2000000000305,11-00000000000,00 00000000000,00 000
000,0000000000000,00 00000000000,00
00000000000,00 00000000000,00 000000000000 000000000
0 -
00000000000000000000000000000000000000000000000000000000000000,0000000,00000
000000000000000000000000000305,11-00000000000,00 00000000000,00
00000000000000000000 000000000000 000000
000000 0000000 0 AA2012001070107060006520013000056SAB
0000081010000000000000000000000002113237000010705010705AR 000 GEGENB.
AU2000000000012,20-00000000000,00 00000000000,00 000
000,0000000000000,00 00000000000,00
00000000000,00 00000000000,00 000000000000 000000000
0 -
00000000000000000000000000000000000000000000000000000000000000,0000000,00000
000000000000000000000000000012,20-00000000000,00 00000000000,00
00000000000000000000 000000000000 000000
000000 0000000 0 AA2012001070107060006520012900056DIV
1011580000000000000000000000000002113237000010705010705AR 000 BUCHUNG
TS1000000000368,08+00000000050,77-00000000000,00 001
016,0000000000000,00 DE118904627
00000000000,00+00000000000,00-00000000000,00 00000,000000 0000000000
EDV RE. KUNDE 000000000000,00
00000000000000000000000000000000000000000000000000000001400000,0000000,00000
000000000000000000000000000368,08+00000000000,00 00000000000,00
00000000000000000000 000000000000 000000
000000 0000000 0 AA2012001070107060006520012800055SAB
0000082010000000000000000000000002113236000010705010705AR 000 GEGENB.
AU2000000001067,89-00000000000,00 00000000000,00 000
000,0000000000000,00 00000000000,00
00000000000,00 00000000000,00 000000000000 000000000
0 -
00000000000000000000000000000000000000000000000000000000000000,0000000,00000
000000000000000000000000001067,89-00000000000,00 00000000000,00
00000000000000000000 000000000000 000000
000000 0000000 0 AA2012001070107070006540034500151SAB
0000082010000000000000000000000002113489000010706010706AR 000 GEGENB.
AU2000000000043,57-00000000000,00 00000000000,00 000
000,0000000000000,00 00000000000,00
00000000000,00 00000000000,00 000000000000 000000000
0 -
00000000000000000000000000000000000000000000000000000000000000,0000000,00000
000000000000000000000000000043,57-00000000000,00 00000000000,00
00000000000000000000 000000000000 000000
000000 0000000 0 AA2012001070107030006460001200006SAB
0000082030000000000000000000000002112927000010702010702AR 000 GEGENB.
AU2000000006166,19-00000000000,00 00000000000,00 000
000,0000000000000,00 00000000000,00
00000000000,00 00000000000,00 000000000000 000000000
0 -
00000000000000000000000000000000000000000000000000000000000000,0000000,00000
000000000000000000000000006166,19-00000000000,00 00000000000,00
00000000000000000000 000000000000 000000
000000 0000000 0 AA2012001070107130006620002000010SAB
0000081010000000000000000000000001106737000010712010712AR 000 GEGENB.
AU2000000029139,56-00000000000,00 00000000000,00 000
000,0000000000000,00 00000000000,00
00000000000,00 00000000000,00 000000000000 000000000
0 -
00000000000000000000000000000000000000000000000000000000000000,0000000,00000
000000000000000000000000029139,56-00000000000,00 00000000000,00
00000000000000000000 000000000000 000000
000000 0000000 0 AA2012001070107050006500005800028SAB
0000081010000000000000000000000002113149000010704010704AR 000 GEGENB.
AU2000000000012,20-00000000000,00 00000000000,00 000
000,0000000000000,00 00000000000,00
00000000000,00 00000000000,00 000000000000 000000000
0 -
00000000000000000000000000000000000000000000000000000000000000,0000000,00000
000000000000000000000000000012,20-00000000000,00 00000000000,00
00000000000000000000 000000000000 000000
000000 0000000 0 AA2012001070107070006540013500064SAB
0000082010000000000000000000000002113399000010706010706AR 000 GEGENB.
AU2000000001645,97-00000000000,00 00000000000,00 000
000,0000000000000,00 00000000000,00
00000000000,00 00000000000,00 000000000000 000000000
0 -
00000000000000000000000000000000000000000000000000000000000000,0000000,00000
000000000000000000000000001645,97-00000000000,00 00000000000,00
00000000000000000000 000000000000 000000
000000 0000000 0 AA2012001070107070006540013400064DIV
1032006600000000000000000000000002113399000010706010706AR 000 BUCHUNG
TS1000000001909,33+00000000263,36-00000000000,00 001
016,0000000000000,00 DE118904627
00000000000,00+00000000000,00-00000000000,00 00000,000000 0000000000
EDV RE. KUNDE 000000000000,00
00000000000000000000000000000000000000000000000000000000100000,0000000,00000
000000000000000000000000001909,33+00000000000,00 00000000000,00
00000000000000000000 000000000000 000000
000000 0000000 0 AA2012001070107070006540013300063SAB
0000082010000000000000000000000002113398000010706010706AR 000 GEGENB.
AU2000000000250,88-00000000000,00 00000000000,00 000
000,0000000000000,00 00000000000,00
00000000000,00 00000000000,00 000000000000 000000000
0 -
00000000000000000000000000000000000000000000000000000000000000,0000000,00000
000000000000000000000000000250,88-00000000000,00 00000000000,00
00000000000000000000 000000000000 000000
000000 0000000 0 AA2012001070107060006520034400147DIV
1042320800000000000000000000000002113329000010705010705AR 000 BUCHUNG
TS1000000000028,35+00000000003,91-00000000000,00 001
016,0000000000000,00 DE118904627
00000000000,00+00000000000,00-00000000000,00 00000,000000 0000000000
EDV RE. KUNDE 000000000000,00
00000000000000000000000000000000000000000000000000000001000000,0000000,00000
000000000000000000000000000000,00+00000000000,00 00000000000,00
00000000000000000000 000000000000 000000
000000 0000000 0
Any ideas???
Thorsten
[2/6] from: sqlab:gmx at: 27-Jul-2001 12:48
Hi Thorsten
> Hi,
> it is obviously not a big thing.
<<quoted lines omitted: 8>>
> that the parser didn't recognise the "AA201", because he always adds the
> whole big string in one part, not the parts of the string as lines.
At first you should advance the focus as in
to "AA201" mark: skip 2 to "AA"
mark sets only a pointer in the string, so your next write/append appends
always from this point until the end of the string.
Of course you could use a copy to or a copy/part with a second pointer.
But why do you not just replace/all next a "AA201" "^/AA201" ?
AR
AR
--
GMXler aufgepasst - jetzt viele 1&1 New WebHosting Pakete ohne
Einrichtungsgebuehr + 1 Monat Grundgebuehrbefreiung!
http://puretec.de/index.html?ac=OM.PU.PU003K00717T0492a
[3/6] from: chris::starforge::demon::co::uk at: 27-Jul-2001 12:58
Thorsten Moeller wrote:
> Any ideas???
Just out of interest ('cos I'm a nosey person) what was that? Looked amost
like protien sequence informtion or something...
Chris the Wildly Inaccurate
--
.------{ http://www.starforge.co.uk }-----. .--------------------------.
=[ Explorer2260, Designer and Coder \=\ P: TexMaker, ROACH, site \
=[___You_will_obey_your_corporate_masters___]==[ Stack: EETmTmTRRSS------ ]
[4/6] from: joel:neely:fedex at: 27-Jul-2001 2:02
Hi, Thorsten,
I'm not a parsing professional (and I don't play one on
TV ;-) but maybe I can make a couple of suggestions...
Thorsten Moeller wrote:
> I have a big string. from this string i need to make a
> readable file with lots of lines. So every line starts
> with AA......
> The Line ends when the next Line with AA comes up.
>
Here's one way, using the sample data from your post
>> foo
== {AA2012001070107120006600000300002DIV
1015940000000000000000000000000001106725000010711010711AR 000
BUCHUNG
TS1000000001301,84...
>> recs: []
== []
>> parse foo [any [
[ "AA" copy rec to "AA" (append recs join "AA" rec) |
[ "AA" copy rec to end (append recs join "AA" rec)
[ ]]
== true
>> length? recs
== 14
>> recs
== [{AA2012001070107120006600000300002DIV
1015940000000000000000000000000001106725000010711010711AR 000
BUCHUNG
TS1000000001301,8...
after which all of your "records" are individual strings in the
block RECS.
While I think that's the closest to your original request, I
can't help adding a couple of others, just for fun.
This version requires the least knowledge of PARSE, but doesn't
show off its power either:
>> bletch: parse/all foo "A"
== ["" "" {2012001070107120006600000300002DIV
1015940000000000000000000000000001106725000010711010711}
{R 000 BUCHUNG
TS100000000...
Discard the first element of the block (it's whatever precedes
the first "A"). After that, every empty string represents a
record boundary (the 0-length gap between the #"A"s in "AA").
Why would anyone write it this way? I had a file some time
back where two-character sequences marked both record *and*
field boundaries. This approach would allow you to process
the individual fields within the records in order.
Another option is
>> parse foo ["AA" copy rec to "AA" copy rest to end]
== true
>> rec
== {2012001070107120006600000300002DIV
1015940000000000000000000000000001106725000010711010711AR 000
BUCHUNG
TS1000000001301,84+0...
>> rest
== {AA2012001070107120006600000200001SAB
0000081080000000000000000000000001106724000010711010711AR 000
GEGENB.
AU2000000012419,49...
which (as long as PARSE returns TRUE) lets you slurp the first
record from the string, leaving the rest in ... REST.
Why would anyone write it this way? You can put the above in
a loop if you only need to process the first few records (e.g.
find the first record with a specified set of criteria).
HTH!
-jn-
--
------------------------------------------------------------
Programming languages: compact, powerful, simple ...
Pick any two!
joel'dot'neely'at'fedex'dot'com
[5/6] from: sqlab:gmx at: 27-Jul-2001 14:30
> Hi Thorsten
> > Hi,
<<quoted lines omitted: 20>>
> At first you should advance the focus as in
> to "AA201" mark: skip 2 to "AA"
As I misplaced the parameter to skip, I will deliver a correct version
b: a: read %/c/temp/test.txt
parse next a [ some [to "AA201" a: (
write/append/lines %/c/temp/new.txt copy/part b a
) b: 4 skip
] ]
AR
> mark sets only a pointer in the string, so your next write/append appends
> always from this point until the end of the string.
>
> Of course you could use a copy to or a copy/part with a second pointer.
> But why do you not just replace/all next a "AA201" "^/AA201" ?
>
--
GMXler aufgepasst - jetzt viele 1&1 New WebHosting Pakete ohne
Einrichtungsgebuehr + 1 Monat Grundgebuehrbefreiung!
http://puretec.de/index.html?ac=OM.PU.PU003K00717T0492a
[6/6] from: moeller_thorsten:gmx at: 30-Jul-2001 9:20
Hi to all the helpers out there,
thanks for all the suggestions. They helped me out and encouraged me to go
further with REBOL.
Thanks
Thorsten
Notes
- Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted