r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

MichaelB
23-Oct-2005
[529]
I just found out that I can't do the following:
s: "a b c"
s: "a c b"
parse s ["a" to ["b" | "c"] to end]


The two strings should only symbolize that b and c can alternate. 
But 'to and 'thru don't work with subrules. It's not even stated 
in the documentation that it should but wouldn't it be natural ? 
Or am I missing some complication for the parser if it would support 
this (in the general case indefinite look-ahead necessary for the 
parser - is this the problem?) ? How are other people doing things 
like this - what if you want to parse something like "a bla bla bla 
c" or "a bla bla bla d" if you are interested in the "bla bla bla" 
which might be arbitrary text and thus can't be put into rules ?
Volker
23-Oct-2005
[530x2]
Carl mentioned performance-problems. Although everyone asks for it.
i use 
 "a" any[ "b" | "c" | skip ] to end
Even slower and less elegantly, but works.
MichaelB
23-Oct-2005
[532]
OK, thanks. Didn't know this. But this solution will work for me 
as well. In a sense this is interesting, as skip isn't a real token, 
but a command - but it's treated as a token. :-)
Chris
23-Oct-2005
[533x2]
Volker, that will return true for "a" as well as "a b c".
I tend to use charsets for this, though again, there is probably 
a performance cost...
Volker
23-Oct-2005
[535x2]
true. never thought about that.
'some would do half the trick. except if nothing is found, it eats 
all and counts as true here.
Chris
23-Oct-2005
[537]
non-bc: complement charset "bc"
parse "a b c" ["a" any non-bc ["b" | "c"] to end]
Volker
23-Oct-2005
[538]
but if it is longer, like "be", every "bi" would fail to.
Chris
23-Oct-2005
[539]
In that case, you need to elaborate a little.
Volker
23-Oct-2005
[540x2]
s: "abibet"
parse s ["a" any non-bc ["be" | "ce"] to end]
More complex than i thought, with that missing thing.
Chris
23-Oct-2005
[542]
Complex indeed.
Volker
23-Oct-2005
[543]
Maybe i used this?
 parse s ["a" some[ "be" break | "ce" break | skip] p: to end]

if nothing is found, it skips to the end. returns true, but if you 
require something after it,
that fails (because already at end).
Izkata
23-Oct-2005
[544]
Michael>I just found out that I can't do the following:
s: "a b c"
s: "a c b"
parse s ["a" to ["b" | "c"] to end]<

parse s ["a" [to "b" | to "c"] to end]
Chris
23-Oct-2005
[545]
Iskata, that breaks if the "c" comes before the "b".
Izkata
23-Oct-2005
[546]
I agree, it should work the other way, too, though..
Chris
23-Oct-2005
[547]
Iz -- d'oh...
Izkata
23-Oct-2005
[548x2]
^.^
But isn't that what is wanted?  (to ["b" | "c"])
Chris
23-Oct-2005
[550x3]
V: perhaps better in this case to use 'while and 'find rather than 
'parse?
Izkata:
>> non-bc: complement bc: charset "bc"
== make bitset! 64#{////////////////8/////////////////////////8=}
>> s1: "a b c"
== "a b c"
>> s2: "a c b"
== "a c b"
>> parse s1 ["a" [to "b" | to "c"] mk: to end] mk
== "b c"
>> parse s2 ["a" [to "b" | to "c"] mk: to end] mk
== "b"
>> parse s1 ["a" any non-bc mk: ["b" | "c"] to end] mk
== "b c"
>> parse s2 ["a" any non-bc mk: ["b" | "c"] to end] mk
== "c b"
Note the difference when parsing 's2...
Izkata
23-Oct-2005
[553x2]
ack... well.. it was worth a try  =P
as you can see, I know some, but am not too strong in parse   ^.^
MichaelB
23-Oct-2005
[555]
=image 
    file: images/a picture.gif 
    size: 200x300
    caption: some caption below the picture 
    desc: some description for the picture


I'm trying to extend Makedoc2 for a project to generate a xml dialect 
and I need much more information to certain elements - e.g. images 
- so I'm trying to make it as easy as possible for the user. The 
above is what I actually wanted to parse - but the order of the information 
is supposed to be free and I can't and don't want to use rebol datatypes 
which might be the first thought to make the parsing easier, because 
normal people don't want to learn too many rules for all these things. 
So the b and c in the example corresponded more to the caption and 
desc in the above example.
Volker
23-Oct-2005
[556x2]
So you want to handle both, not only one of them? Something like
 some[ ( caption: desc: none )
  set caption caption-rule 
  | set desc desc-rule
 ] ( if all[caption desc][handle-them] )
No, initalisation before some..
 ( caption: desc: none ) some[
MichaelB
23-Oct-2005
[558]
but aren't this only block parsing rules ? (because of set)
Izkata
23-Oct-2005
[559]
I'm gonna try again:

>> s: {=image
{    file: images/a picture.gif
{    size: 200x300
{    caption: some caption below the picture
{    desc: some description for the picture}
== {=image
file: images/a picture.gif
size: 200x300
caption: some caption below the picture
desc: some description for the pictu...
>> parse head append s {^/} [
[    some [
[        thru {file: } copy file to {^/} |
[        thru {size: } copy size to {^/} |
[        thru {caption: } copy cap to {^/} |
[        thru {desc: } copy desc to {^/}
[        ]
[    ]
Volker
23-Oct-2005
[560]
right, mistake. with strings that is copy, not set.
Izkata
23-Oct-2005
[561]
err wait.. then they can't have newline inside the description/caption 
  (x_x)
MichaelB
23-Oct-2005
[562]
ok - have to try this ideas
Volker
23-Oct-2005
[563x3]
IMHO 'to and 'thru are only for simple cases. You need a real bnf. 
or you can use two parses. the first takes only the lines after image, 
then a second processes the lines.
http://polly.rebol.it/test/test/parse-images.r
updated with pure parse-rule. but better support for such cases would 
be nice, should not be guru-level.
Graham
23-Oct-2005
[566x2]
Has clean-script been updated for the new version of Core?
It barfs on data/(...)
Graham
31-Oct-2005
[568]
How to exit a parse rule in the middle and return true ? ( to allow 
the next rule to be applied ... )
Volker
31-Oct-2005
[569]
'break
Henrik
31-Oct-2005
[570]
interesting... will write that in the wikibook :-)
Volker
31-Oct-2005
[571]
or "end skip". with break the parsed part counts as success. with 
end skip it counts as failure and backtracks.
Graham
31-Oct-2005
[572x4]
This is part of my scheduler dialect

away-days  is a block of [ start-date end-date reasons ]
current-date is the date I am looking at

The syntax is

away 25-Dec-2005 on holiday
away 25-Dec-2005
away from 25-Dec-2005 to 7-Jan-2006 on "summer holidays"

I want to add

away every Wednesday at "golf course"
away-rule: [ 
	'away [ 
		set awaydate date! (repend away-days [ awaydate awaydate]) | 

  'from set awayfrom date! 'to set awayto date! ( repend away-days 
  [ awayfrom awayto ]) |
		'every set day word! ( 

   either day = to-word pick system/locale/days current-date/weekday 
   [
				repend away-days [ current-date current-date ]
			][
				...break out of rule... 
			]
		 )
	]
	( reason: copy "" )
	opt [ [ 'on | 'at ] set reason [ word! | string! ]]
	( append away-days to-string reason )
]
Now if the current-date matches a Wednesday, I am okay.

But if not, I want to leave the rule at that point, and move on to 
the next rule.
'break can only be used within the parse dialect, so that won't work.
Volker
31-Oct-2005
[576]
the general way:

 rule: [  ( dummy-rule: [] if not ok? [ dummy-rule: [end skip] ) dummy-rule 
 ]
Graham
31-Oct-2005
[577]
oh ... looks ugly.
Volker
31-Oct-2005
[578]
It is.