r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

Tomc
11-Oct-2005
[509]
split-text: func [txt [string!] n [integer!]
    /local frag fraglet bl frag-rule bs ws
][  ws: charset [#" " #"^-" #"^/"]
    bs: complement ws
    bl: copy []
    frag-rule: [
        any ws
        copy frag [
            [1 n skip 
                opt[copy fraglet some bs]
            ]
            | to end skip
        ]
        (all [fraglet join frag fraglet]
         insert tail bl frag 
         print frag
        )
    ]
    parse/all txt [some frag-rule]
    bl
]
Graham
11-Oct-2005
[510]
I have to say Tomc is a winner !
Tomc
11-Oct-2005
[511]
it was the copy frag 1 n skip
Graham
11-Oct-2005
[512]
Unspecified .. but tom's function removes leading white space as 
well.  Ladislav's preserves whitespace.
Ladislav
11-Oct-2005
[513]
yes
Graham
11-Oct-2005
[514]
should stick both up on the library ..
Tomc
11-Oct-2005
[515x2]
can easily commont out the any ws line if that is not desired  ... 
or make it a refinement
good night thanks for the puzzle
Graham
11-Oct-2005
[517x2]
thanks for the input.
and thanks both for the solution.
Ladislav
11-Oct-2005
[519]
good night, Tom
Graham
11-Oct-2005
[520x2]
Next Devcon they should set aside a few mins for some programming 
contests :)
and virtual spectators being allowed to compete.
Ammon
11-Oct-2005
[522x2]
Not a half bad idea.
Word on the street is that you're offering to host this next one.
Ladislav
11-Oct-2005
[524]
hi Ammon, missed you in Italy
Ammon
11-Oct-2005
[525]
Hello.  I missed you all as well.
Graham
11-Oct-2005
[526]
well, if people want to come to Wellington .. I'll see what I can 
do.
Ladislav
16-Oct-2005
[527x2]
I checked RAMBO and the discussed issue seems to be there: #3579 
by Piotr.
Gabriele: the [copy frag [n skip | to end] (insert tail result any 
[frag""])] looks too complicated for the PARSE setting words to NONE 
justification, I would at least prefer to add it to the ticket as 
an example.
MichaelB
23-Oct-2005
[529]
I just found out that I can't do the following:
s: "a b c"
s: "a c b"
parse s ["a" to ["b" | "c"] to end]


The two strings should only symbolize that b and c can alternate. 
But 'to and 'thru don't work with subrules. It's not even stated 
in the documentation that it should but wouldn't it be natural ? 
Or am I missing some complication for the parser if it would support 
this (in the general case indefinite look-ahead necessary for the 
parser - is this the problem?) ? How are other people doing things 
like this - what if you want to parse something like "a bla bla bla 
c" or "a bla bla bla d" if you are interested in the "bla bla bla" 
which might be arbitrary text and thus can't be put into rules ?
Volker
23-Oct-2005
[530x2]
Carl mentioned performance-problems. Although everyone asks for it.
i use 
 "a" any[ "b" | "c" | skip ] to end
Even slower and less elegantly, but works.
MichaelB
23-Oct-2005
[532]
OK, thanks. Didn't know this. But this solution will work for me 
as well. In a sense this is interesting, as skip isn't a real token, 
but a command - but it's treated as a token. :-)
Chris
23-Oct-2005
[533x2]
Volker, that will return true for "a" as well as "a b c".
I tend to use charsets for this, though again, there is probably 
a performance cost...
Volker
23-Oct-2005
[535x2]
true. never thought about that.
'some would do half the trick. except if nothing is found, it eats 
all and counts as true here.
Chris
23-Oct-2005
[537]
non-bc: complement charset "bc"
parse "a b c" ["a" any non-bc ["b" | "c"] to end]
Volker
23-Oct-2005
[538]
but if it is longer, like "be", every "bi" would fail to.
Chris
23-Oct-2005
[539]
In that case, you need to elaborate a little.
Volker
23-Oct-2005
[540x2]
s: "abibet"
parse s ["a" any non-bc ["be" | "ce"] to end]
More complex than i thought, with that missing thing.
Chris
23-Oct-2005
[542]
Complex indeed.
Volker
23-Oct-2005
[543]
Maybe i used this?
 parse s ["a" some[ "be" break | "ce" break | skip] p: to end]

if nothing is found, it skips to the end. returns true, but if you 
require something after it,
that fails (because already at end).
Izkata
23-Oct-2005
[544]
Michael>I just found out that I can't do the following:
s: "a b c"
s: "a c b"
parse s ["a" to ["b" | "c"] to end]<

parse s ["a" [to "b" | to "c"] to end]
Chris
23-Oct-2005
[545]
Iskata, that breaks if the "c" comes before the "b".
Izkata
23-Oct-2005
[546]
I agree, it should work the other way, too, though..
Chris
23-Oct-2005
[547]
Iz -- d'oh...
Izkata
23-Oct-2005
[548x2]
^.^
But isn't that what is wanted?  (to ["b" | "c"])
Chris
23-Oct-2005
[550x3]
V: perhaps better in this case to use 'while and 'find rather than 
'parse?
Izkata:
>> non-bc: complement bc: charset "bc"
== make bitset! 64#{////////////////8/////////////////////////8=}
>> s1: "a b c"
== "a b c"
>> s2: "a c b"
== "a c b"
>> parse s1 ["a" [to "b" | to "c"] mk: to end] mk
== "b c"
>> parse s2 ["a" [to "b" | to "c"] mk: to end] mk
== "b"
>> parse s1 ["a" any non-bc mk: ["b" | "c"] to end] mk
== "b c"
>> parse s2 ["a" any non-bc mk: ["b" | "c"] to end] mk
== "c b"
Note the difference when parsing 's2...
Izkata
23-Oct-2005
[553x2]
ack... well.. it was worth a try  =P
as you can see, I know some, but am not too strong in parse   ^.^
MichaelB
23-Oct-2005
[555]
=image 
    file: images/a picture.gif 
    size: 200x300
    caption: some caption below the picture 
    desc: some description for the picture


I'm trying to extend Makedoc2 for a project to generate a xml dialect 
and I need much more information to certain elements - e.g. images 
- so I'm trying to make it as easy as possible for the user. The 
above is what I actually wanted to parse - but the order of the information 
is supposed to be free and I can't and don't want to use rebol datatypes 
which might be the first thought to make the parsing easier, because 
normal people don't want to learn too many rules for all these things. 
So the b and c in the example corresponded more to the caption and 
desc in the above example.
Volker
23-Oct-2005
[556x2]
So you want to handle both, not only one of them? Something like
 some[ ( caption: desc: none )
  set caption caption-rule 
  | set desc desc-rule
 ] ( if all[caption desc][handle-them] )
No, initalisation before some..
 ( caption: desc: none ) some[
MichaelB
23-Oct-2005
[558]
but aren't this only block parsing rules ? (because of set)