More Parsing questions
[1/6] from: greggirwin::starband::net at: 7-Sep-2001 9:58
Thanks for jumping in Ladislav. I'll be adding that to my notebook!
--Gregg
[2/6] from: lmecir:mbox:vol:cz at: 7-Sep-2001 10:24
You can try this:
to-rule: function [
{generate a to A parse rule}
a [block!]
] [cont finish] [
reduce [
'any reduce [
reduce [
a to paren! [cont: [to end skip] finish: []] '|
to paren! [cont: [] finish: [to end skip]] 'skip
]
'cont
]
'finish
]
]
comment {
Example #1:
to-space-or-br: to-rule [" " | "<br>"]
result: ""
parse/all "aa" [to-space-or-br copy result to end]
probe result
parse/all "a a<br>" [to-space-or-br copy result to end]
probe result
parse/all "ab<br> " [to-space-or-br copy result to end]
probe result
Example #2:
digit: charset [#"0" - #"9"]
four-digit: [4 digit]
to-four-digit: to-rule four-digit
tfd: [to-four-digit copy fd four-digit to end]
parse/all "abcd 1234" tfd
probe fd ; == "1234"
}
Regards
Ladislav
P.S. Other functions that may be of some use are:
a-b: function [
{Generate an A-B parse rule}
a [block!]
b [block!]
] [finish] [
reduce [
reduce [
b to paren! [finish: [to end skip]] '|
to paren! reduce [first [finish:] a]
]
'finish
]
]
comment {
Example:
a: [any "a" "b"]
b: ["aa"]
parse "ab" a-b a b
parse "aab" a-b a b
}
not-rule: function [
{Generate a not A parse rule}
a [block!]
] [finish] [
reduce [
reduce [
a to paren! [finish: [to end skip]] '|
to paren! [finish: []]
]
'finish
]
]
comment {
Example:
a: [any "a" "b"]
parse "ab" not-rule a
parse "b" not-rule a
parse "" not-rule a
}
[3/6] from: pablohar:ho:tmail at: 6-Sep-2001 11:06
Hi Stefan..
I'm not a parse guru
but try this...
digit: charset [#"0" - #"9"]
alpha: charset [#"A" - #"Z" #"a" - #"z"]
alphanum: union alpha digit
Rule: [any alpha 4 digit any alpha]
It's parsing my examples only for 4 digits (not more not less) and works
[4/6] from: greggirwin:starband at: 6-Sep-2001 11:54
Hi Stefan,
In writing my little RegEx function (posted a few days ago) I found that you
can use a bitset as a target for 'to or 'thru. I played around with a few
things here and didn't find a quick solution but I'll try to find some time
later to investigate. If we can't find a parse solution, it shouldn't be too
tough just to tokenize and parse the line yourself in this case.
--Gregg
[5/6] from: syke:amigaextreme at: 6-Sep-2001 7:04
Hi list,
as usual, I'm having parsin troubles. Here's what I'm trying to do:
I have a regular text file containing names, with phone numbers, like this:
John Doe 5694
Jane Doe 0445-38352
etc.. (I read it in using read/lines)
what I want to do is to find all the lines that only have four digits in
them, the other ones should be ignored.
So I tried this:
>> data: "Ebba Grön 2345"
== "Ebba Grön 2345"
>> parse data [to "2345" to end]
== true
And after that, I tried this:
>> digit: charset "0123456789"
== make bitset! #{
000000000000FF03000000000000000000000000000000000000000000000000
}
>> parse data [to 4 digit to end]
== false
Why does this become false?
What I want to do is, if parse returns true, it should copy the entire line,
add a newline to that line, and then save the file. I realise that this
parse rule will find lines containing for example 6 digits in a row, but I
guess it's a start. This is what I'd like to achieve:
foreach line data [
if (parse line [to 4 digit to end]) [append new-data (rejoin [ line
newline])]
]
Or something similar, the trick is to get the lines that contain only 4
digits, not the ones containing 5 or more digits (no matter what order they
are in, xxx-xxxxxxx, xxxxxxx, or anything else).
/Regards Stefan Falk
www.amigaextreme.com
[6/6] from: petr:krenzelok:trz:cz at: 6-Sep-2001 7:27
Stefan Falk wrote:
> Hi list,
> as usual, I'm having parsin troubles. Here's what I'm trying to do:
<<quoted lines omitted: 16>>
> >> parse data [to 4 digit to end]
> == false
I don't use parse too much, so I am no parse guru, but:
data: "Ebba Grön 2345"
digit: charset [#"0" - #"9"] ; but your solution is equivalent :-) ...
alpha: complement digit
parse/all data [some [alpha | digit]]
but it will return true even in case of e.g. "1234-123454656"
so you could turn it into sequence:
parse/all data [some alpha some digit end]
---------------
->> data: "Ebba Grön 2345"
== "Ebba Grön 2345"
->> parse/all data [some alpha some digit end]
== true
---------------
->> data: "Ebba Grön 2345-1234556"
== "Ebba Grön 2345-1234556"
->> parse/all data [some alpha some digit end]
== false
imo you can't use "to digit", as parser seems to me as matching engine, so you
can skip only to known string, not some bitset, but I can be wrong.
I don't know what is your exact data structure, but you could also:
parse/all data " ", so if your data are space separated, you will end-up with
following block:
->> parse/all data " "
== ["Ebba" "Grön" "2345"]
then, instead of using parse rules, some easy condition will work for you:
if not found? find last parse/all data " " "-" [print "data OK"]
hope-this-helps,
-pekr-
Notes
- Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted