r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

BrianH
8-Nov-2008
[2935x3]
I am the editor of the PARSE proposals.


It was decided that I perform this role because Carl is focused on 
the GUI work right now and someone qualified had to do it. With Carl 
busy and Ladislav not here, I am the one left who has the most background 
in parsing and the most understanding of what can be done efficiently 
and what can't. When the PARSE REPs of old were discussed, I was 
right there in the conversation and the originator of about half 
of them, mostly based on my experience with other parsers and parser 
generators. Because of this I am well aware of the original motivation 
behind them, and have had many years to think them through. It's 
just head start, really.


I am also the author of the current implementation of COLLECT and 
KEEP, based on Gabriele's original idea, which was a really great 
idea. It is also really limited. Collecting information and building 
data structures out of it is the basic function that programming 
languages do, and something that REBOL is really good at. I am not 
in any way denigrating the importance of building data structures. 
I certainly did not mean to imply that your appreciation of that 
important task was in any way less important.


The role of an editor is not just to collect proposals, but to make 
sure they fit with the overall goal of the project. This sometimes 
means rejecting proposals, or reshaping them. This is not a role 
that I am sorry about - someone has to do it to make our tool better. 
We are not Perl, this is not anything goes, we actually try to make 
the best decisions here. I hate to seem the bad guy sometimes, but 
someone has to do it :(


PARSE is a portion of REBOL that is dedicated to a particular role. 
It recognizes patterns in data, extracts some of the data, and then 
calls out to the DO dialect to do something with the data. It doesn't 
really do anything to the data itself - everything happens in the 
DO dialect code in the parens. It is fairly simple really, and from 
carefully designed simplicity it gets a heck of a lot of power and 
speed. That is its strength.


The thing that a lot of people don't remember when making improvements 
to a dialect like PARSE is that PARSE is only one part of REBOL. 
If something doesn't go into PARSE, it can go into another part of 
REBOL. We have to consider the language as a whole when we are doing 
things like this.

Here is the overall rationale for the PARSE dialect proposals:

- All new features need to be simple to explain and use, and fast 
at runtime.
- A good feature would be one of these:

  - An extremely powerful enhancement of PARSE's language recognition.

  - A fix to a design flaw in an existing feature, or a compatibility 
  fix.

  - A serious improvement to a sufficiently common use case, or common 
  error.


The reason I didn't want to put COLLECT and KEEP into PARSE is because 
it is a small part of a much bigger problem that really needs a lot 
of flexibility. Different structure collection and building situations 
require different behavior. It just so happens that the DO dialect 
is much better suited to solving this particular problem than the 
PARSE dialect is. Remember, PARSE is a native dialect, and as such 
is rather fixed.


There are some PARSE proposals that make parse actually do something 
with the data itself: CHANGE, INSERT and REMOVE. We were very careful 
when we designed those proposals. In particular, we wanted to provide 
the bare minimum that would be necessary to handle some very common 
idioms that are usually done wrong, even by the best PARSE programmers. 
Sometimes we add stuff into REBOL that is just there to solve a commonly 
messed up problem, so that a well debugged solution would be there 
for people to choose instead of trying to solve it again themselves, 
badly. (This is why the MOVE function got added to R3 and 2.7.6, 
btw.) Even with that justification those features might not make 
it into PARSE because they change the role of PARSE from recognition 
to modification. I have high hopes, though.


Another proposal that might not make it into PARSE is RETURN. RETURN 
is another ease-of-use addition. In particular, the thing it makes 
easy is stopping the parse in the middle to return some recognized 
information. However, it changes the return characteristics of PARSE 
in ways that may have unpredictable results, and may not have enough 
benefit. The proposal that has a better chance of making it is BREAK/return, 
though I'd like to see both (we can hope, right?).


Most of the REPs from Gabriele's doc have been covered. Most of them 
have been changed because we have had time in the last several years 
to give them some thought; the only unchanged ones are NOT and FAIL, 
so far. Some have been rejected because they just weren't going to 
work at all (8 and 12). THROW and DO are still under discussion - 
the proposals won't work as is, but the ideas behind them have merit. 
The rest have been debated and changed into good proposals. Note 
that the DO proposal would be rejected outright for R2, but R3's 
changes to word binding make it possible to make it safe (as figured 
out during a conversation with Anton this evening).


There are other features that are not really changes to the PARSE 
dialect, and so are out of scope for these proposals. That doesn't 
mean that they won't be implemented, just that they are a separate 
subject. That includes delimiter parsing (sorry, Petr), tracing (sorry, 
Henrik), REBOL language syntax (sorry, Graham), and port parsing 
(sorry, Steeve, Anton, Doc, Tomc, et al). If it makes you feel better, 
while discussing the subject with Anton here I figured out a way 
to do port parsing with the R3 port model (it wouldn't work with 
the R2 port model). I will bring these all up with Carl when it comes 
to that.


I hope that this makes the situation and my position on the subject 
clearer. I'm sorry for any misunderstandings that arose during this 
process.
Note that I am quite familiar with collecting data from hierarchical 
and other structures and putting that data into hierarchical and 
other data structures. I have done this with PARSE, with DO dialect 
code, and with a combination of the two. I have found that PARSE 
is good for recognition, but DO dialect code is best for the construction. 
A mix of both is usually the best strategy. You can use the existing 
COLLECT and KEEP with PARSE quite well. PARSE is not a standalone 
dialect - it is meant to be integrated with other dialects, particularly 
the DO dialect that gets executed in the parens.
However, most of my contributions to REBOL.org were lost during one 
of their reorgs years ago and I have been mostly contributing in 
other ways lately. Like helping people out here and writing REBOL's 
mezzanine functions. I barely go to REBOL.org anymore except to search 
the code there for mezzanine usage so that I know what is safe to 
change. Outside of work that goes into REBOL community projects, 
most of my scripts have been either one-offs or under NDA lately. 
Sorry.
Sunanda
8-Nov-2008
[2938]
BrianH -- is it possible to incorporate the TRACE/DEBUG suggesion 
as part of the doc? Parse is so complex/deep/subtle that it needs 
some transparency.
See my earlier message above, or here:
http://www.rebol.org/aga-display-posts.r?post=r3wp210x2855
BrianH
8-Nov-2008
[2939x3]
I have been a member of the REBOL community, of varying activity, 
since 1999. If you have used REBOL in the 21st century you have probably 
used code I wrote. I understand the confusion - I was not very social 
for a while.
Oh, that was your suggestion? I thought it was Henrik. It's on the 
list, and thanks for the link :)
Right now the Parse Proposals doc is for dialect enhancements. I 
am keeping a list of improvements out of that scope that will get 
worked on as well. Don't worry...
Sunanda
8-Nov-2008
[2942]
Thanks -- I'm sure I'm not the first to have had the idea.
BrianH
8-Nov-2008
[2943x4]
You aren't, but it's still a good idea :)
I'm glad that we are finally planning on PARSE improvements. PARSE 
has been the primary REBOL feature I've used for 8 years now.
That was a lot of writing - I must have gotten angry after all.
I forget sometimes that the REBOL community has been around long 
enough that many of the people from the early days aren't here anymore. 
I guess a lot of people don't remember me from the latter part of 
the REBOL 1 days and think I am a newbie. Sorry :(
Graham
8-Nov-2008
[2947]
Yep, I'm using code that you've written Brian :)  Probably from years 
ago !
BrianH
8-Nov-2008
[2948x4]
I am sorry if it seemed like I was taking credit for various PARSE 
proposals. It is not anyone's fault that I have been using PARSE 
long enough that 7 or more years ago I came up with almost every 
one of those proposals, or their original inspirations. That's where 
the conversations that led to the REPs came from. There weren't as 
many REBOL users back before the Official Guide and REBOL for Dummies 
:)


If other people who have had the same ideas since would like to add 
their names to the appropriate proposals I would be more than happy 
to help - consider them to be votes. I would be happier still if 
someone came up with a better way to do THROW or DO, because I am 
at a loss to figure out a way that isn't dumb :(
I am embarrassed that the best, most obvious proposal is one that 
I completely spaced on. Congrats on REVERSE, Carl :)
It never occured to me or anyone else who was talking about enhancing 
PARSE in days of yore, not even Ladislav :(
Thanks, Graham :)
Graham
8-Nov-2008
[2952]
Pekr, Gabriele, Tomc, yourself and I are probably the longest Rebolers 
here these days
BrianH
8-Nov-2008
[2953]
btw Sunanda, I can't remember who was the first to think of having 
trace support for PARSE, but it wasn't me :)
Graham
8-Nov-2008
[2954]
Is attribution really important?
BrianH
8-Nov-2008
[2955]
Steeve has indicated that it is.
Graham
8-Nov-2008
[2956]
Is the driving force for fixing parse so that it can better parse 
data, or, to build better dialects, or both?
BrianH
8-Nov-2008
[2957x3]
Strangely enough I wasn't talking about code I had posted that you 
might have used. In the early days before REBOL.org, I tended to 
post code to the mailing list. The code that you would have used 
would have been in REBOL itself - I used to give very detailed messages 
to feedback about mezzanine bugs, usually with rewrites. Many of 
those rewrites made it into REBOL, especially in the 2.5 version. 
Some natives too (to-local-file and to-rebol-file were based on REBOL 
code I wrote and posted).
Right now the driving force is building better dialects - Carl needs 
it for the new GUI dialects. The data parsing improvements have just 
accumulated over the years and now seems like as good a time as any, 
especially because of the R3 compatibility break.
Some are needed because of Unicode. You can't effectively complement 
a charset anymore so NOT is needed.
Graham
8-Nov-2008
[2960x2]
ahh... so we can blame you for to-rebol-file problems??  :)
http://members.core.com/~bhawley/rebol/to-rebol-file.r
BrianH
8-Nov-2008
[2962x2]
I can't shut down that website - my account was canceled more than 
6 years ago and I can't access it. If you look elsewhere on the site 
you will find the only site on the internet with the Oberon Compiler 
for DOS - the developer disappeared without a trace.
I've wanted to change the licensing on that script to BSD for years. 
It can do more than the native version - they simplified it.
Graham
8-Nov-2008
[2964]
Your CV is a bit out of date!
BrianH
8-Nov-2008
[2965x3]
Can't edit the site :(
Wow, that code is so primitive.
Except require.r - that is advanced even today.
Anton
8-Nov-2008
[2968]
I should just say that I really appreciate the enormous amount of 
energy that BrianH has put into this project (and generally). I can 
see there's a lot of work to manage all the proposals.
Steeve
8-Nov-2008
[2969]
2, 3 tips 
I know that Brian is great contributor but I think sometimes 
it tends to reject a bit too easily ideas of others. 
Why I am saying this, is that I am not always convinced by his 
arguments but he acts as if the issue was resolved in advance. 
(I may have a problem with that) 

About who is credited with what, I think that this is not 

important too, however it was a bit of supris seeing the name of 
Brian on 
most of the ideas, then as I said previously, this makes 

many years that these Improvements have been suggested by different 
people. 

Obviously this is not an important step, but Brian, you puting your 
name 

everywhere pretexting you collect ideas is a little ... how to say 
that ?
pretentious. 

Personally, I am a large user of parsing. 
I think this is the most important function in Rebol. 
You can do practically everything with it.
Design dialects, interfaces, and many others things. 

Parse can build programs by clearly showing the data structures your 
are dealing with. 
Thus our scripts win in readability. 


During all these years, I was very frustrated seeing some limitations. 

I thought, oh my God, if only we could do this simply, REBOL would 
be so powerful. 

My view is that parse should be extended (as far as possible) to 
gain in expressiveness.

One thing I don't like with parse, is the cumbersome process to pass 
parameters to functions. 

I give an example. 
usually we do: [copy parm my-rule (my-func parm)] 

If parse knew recognize when to call a function we could write: 
	[Myfunc my-rule] 

This would be much more compact and expressive. 

More, we could use return value of myfunc to guess if the parsing 
should continue or not. 

This development would discard most of proposals that were made because 
we could add many new commands very easily. 
(IF NOT ALL RETURN etc ...)
Graham
8-Nov-2008
[2970x3]
Steeve, that looks like a major change ... whereas I think Carl is 
just asking for enhancements.  Maybe Rebol4 ??
ps: I don't mind being confused with BrianH :)
maybe that way I'll gain access to some of the private channels ! 
lol
Steeve
8-Nov-2008
[2973]
u could, you are a celebrity (for me, after what i said to Brian, 
there's no chance) ;-)
Graham
8-Nov-2008
[2974x2]
All this parse stuff is over my head .. I try to avoid headaches. 
 Let the experts work it out I say.
I just want to be able to better parse XML namespaces and all.
Steeve
8-Nov-2008
[2976x2]
i agree, better is my second name, lol
and my english is to poor to dealing with experts
Graham
8-Nov-2008
[2978x2]
you could write in french ...
Off topic ... but one of my first chat programs used SOAP to do automatic 
translations
Steeve
8-Nov-2008
[2980x2]
i think i do a little better than automatic translators.
no ? ;-)
Graham
8-Nov-2008
[2982]
But it means that native french speakers could read it as you intended 
...
Steeve
8-Nov-2008
[2983x2]
but the
French Reboler are not very interesting with my strayings