r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Core] Discuss core issues

Maxim
26-Apr-2011
[1315]
rename capabilities in file handling do not normally allow paths 
to be used (in the OS itself).  otherwise these are a called 'move 
file operations. 


e.g. if you try using paths with rename in the DOS shell, you get 
errors.
BrianH
26-Apr-2011
[1316x3]
John, refinements that can't be translated to path use can be used 
for other reasons in other dialects. REBOL isn't just DO.
When using refinements in dialect contexts where they will be translated 
from paths, it makes no sense to use them, but that is no reason 
to exclude them from the datatype. (That was the official decision 
for R3 when meijeru asked this question in CureCode here in mid-2009: 
http://issue.cc/r3/743)
Sorry, the actual decision was in a different ticket, but the discussion 
was in #743. Sometimes it can be a problem to make multiple tickets 
for the same problem, as opposed to different parts of the same problem; 
it can get a little confusing. Stuff like this is why we rearrange 
tickets more now.
Geomol
26-Apr-2011
[1319]
It seems to me to be a sought for explanation of some inconsistency 
in the language. Also from the discussion in that ticket.
BrianH
26-Apr-2011
[1320]
Paths <> lists of refinements. It's inconsistent in inconsistent 
situations. Refinements are supposed to be useful for more than function 
specs.
Geomol
26-Apr-2011
[1321]
Sure, but do you believe, refinements like /1, /1a and /1.2 are made 
on purpose or just a side effect on how it's implemented?
BrianH
26-Apr-2011
[1322x2]
Refinements like those would be useful as keywords in dialects, for 
instance.
There are a lot of values that can appear in paths that don't translate 
to refinements. Paths and refinements are only related in function 
call syntax; otherwise they are not related.
Geomol
26-Apr-2011
[1324]
I don't buy it, as I could just use #1, #1a and #1.2.
BrianH
26-Apr-2011
[1325]
There were a lot more tickets related to this, which are unfortunately 
difficult to search for because different people use different terminology 
for this problem, so they don't find the previous tickets. What I'm 
summarixing here is the final decision. I don't remember when that 
decision was made, but I remember the reasoning.
Geomol
26-Apr-2011
[1326]
ok
Maxim
26-Apr-2011
[1327x2]
john, have alternate datatypes for dialects is VERY good.   issues 
have long been ignored becaue people forget they exist.
have=having
Geomol
26-Apr-2011
[1329x2]
If this is the case, then I don't understand, why refinements don't 
just act like strings without spaces.
Then you would be able to produce all kinds of refinements, like 
/:1
Maxim
26-Apr-2011
[1331x3]
it just depends how they're coded internally, I guess.
its possible their container doesn't allow it.
I also think that refinements are sort of meant to be path parts, 
so it makes sense to make them compatible with paths directly.  though 
I guess one can give examples of path incompatible refinements.
Geomol
26-Apr-2011
[1334x3]
I still think, these refinements are not on purpose. :-)

Check this part in the Core manual: http://www.rebol.com/docs/core23/rebolcore-16.html#section-3.7
Max, there are many, like /(1)
Anyway, it's good to be aware of these things, also for the programmers, 
who develop alternatives to REBOL.
BrianH
26-Apr-2011
[1337]
We are still making tickets related to word and refinement inconsistencies 
for R3 (or at least I am, when I find bugs in the syntax while I'm 
trying to reverse engineer the syntax docs). While the numeric refinement 
issue is settled, there are other issues that haven't yet been discovered. 
Most of the syntax problems are related to scanner precedence. All 
of the word and path datatypes can be constructed with characters/contents 
that don't really scan the same way in literal syntax, but it is 
not really considered an error. Datatypes are meant primarily for 
in-memory use - their syntax is secondary, and in many cases the 
literal syntax only covers a subset of the possible values.
Geomol
26-Apr-2011
[1338]
The original design of REBOL has many many great ideas. It's just 
the implementation, that isn't good enough in many cases. With these 
new explanations, the whole thing just get more complex, which isn't 
good. My view is, that it's better to stick with a simple design 
and work on getting that implemented.
BrianH
26-Apr-2011
[1339]
The problem is that believing the original design to be simple is 
what got us into this mess. The original design was an overview, 
and the details were unspecified. The overview was simple and still 
is, but now we are looking at, documenting and refining the details.
Geomol
26-Apr-2011
[1340]
For a scanner (also called lexical analyser), I can recommend studying 
the UNIX command lex. The code produced might be a bit bigger in 
size, but it's fast and produce good result.
BrianH
26-Apr-2011
[1341]
I've used lex, and many other compiler generators over the years, 
and yes, it's helped with the REBOL syntax documentation discovery. 
However, it might help you to know that the REBOL lexical analyzers 
and parsers are hand-written, not generated. This is why TRANSCODE 
in R3 and LOAD in R2 are so fast, but it is also why it is so tricky 
to resolve syntactic precedence bugs quickly.
Geomol
26-Apr-2011
[1342x2]
Yes, I've kinda guessed, it was hand-written. I allow me to doubt, 
it's faster, as actual measurements would be needed to compare.
But it's most likely smaller in size, the code for the scanner.
BrianH
26-Apr-2011
[1344x2]
It used to be generated, but Carl says it's faster. I don't doubt 
him, because I've used dozens of parser generators before and that 
always seems to be the case. Sometimes you can get faster generated 
parsers, but generated lexers are usually slower because they're 
table-driven rather than code-driven. The advantage to generated 
lexers is that they are easier to write for complex lexical rules; 
for simple lexical rules, it is usually worth hand-coding.
One of the tricks when refining the details is to realize that there 
is a real runtime difference between recommending that people not 
do something, and prohibiting something. Every time we prohibit something 
it has runtime overhead to enforce that prohibition. So every recommendation 
needs documenting and explaining, but every prohibition needs justifying. 
There are situational tradeoffs that recommendations can resolve 
easier than prohibitions. This is why we have to be extra careful 
about this.
Geomol
26-Apr-2011
[1346]
REBOL has 26 or so datatypes recognized by the scanner. That I would 
call complex lexical rules. Maybe a generated lexer will resolve 
many of the problems?
BrianH
26-Apr-2011
[1347x2]
Actually, that's still considered pretty simple. You still might 
need a DFA for some of the rules, but most of them can be recognized 
by hand-written code more efficiently. The problems are not caused 
by not using a generated lexer - even a generated lexer can have 
precedence errors. The real syntax bugs in R3 are there because noone 
has really gone through and figured out what they are, systematically; 
most of them are still undocumented. Recently, in my spare time, 
I've been trying to go through and document the syntax and ticket 
the bugs, so soon the limit will be developer time. (In R2, the bugs 
are there because the syntax is frozen for backwards compatibility.)
As for the syntax-vs-memory data restrictions, it's another tradeoff. 
Regular REBOL syntax is much more limited than the full data model 
of REBOL, even if you include MOLD/all syntax, because the syntax 
was designed more for readability and writeability by humans. If 
we limit the data model to match the syntax, we limit our capabilities 
drastically. Limiting to the syntactic form only makes sense when 
you are serializing the data for storage or transport; in memory, 
it's unnecessary. A better solution is making a more comprehensive 
serialization format that doesn't have to be human readable - Rebin 
- and then using it when we need to serialize more of the in-memory 
data.
Geomol
26-Apr-2011
[1349]
I went through the scanner systematically 2 years ago, produced a 
document, which I sent to Carl. It's here:
http://www.fys.ku.dk/~niclasen/rebol/rebol_scanner.html
BrianH
26-Apr-2011
[1350]
Cool, I'll take a look. I've been trying to generate compatible parsers 
in mezzanine PARSE code, which could then be translated to other 
parse models like syntax highlighters for editors when necessary. 
I'm hoping to make a module of rules that can be used by a wide variety 
of syntax analyzers.
Geomol
26-Apr-2011
[1351]
Actually, that's still considered pretty simple.


Can you give examples of other lexers, that has to recognize more 
different tokens?
BrianH
26-Apr-2011
[1352]
C++ and Perl.
Maxim
26-Apr-2011
[1353]
if you include schema validation... I'd say XML is a nightmare  :-)
Geomol
26-Apr-2011
[1354]
C++ hmm. Is that because you see each of the reserved keywords as 
a different token? I see all them as one.
BrianH
26-Apr-2011
[1355]
One of the interesting tradeoff tickets is http://issue.cc/r3/537
- I wrote up the ticket initially and expanded it to include all 
affected characters, but looking at it now I'd have to recommend 
that it be dismissed. If it is accepted it would have the side effect 
that more syntax would be accepted, but all of the newly accepted 
syntax would be hard to read. Accepting that ticket would make R3 
more difficult to read, debug and maintain, so it's a bad tradeoff.
Geomol
26-Apr-2011
[1356]
XML is some of the simplest to parse, and I guess schema too.
BrianH
26-Apr-2011
[1357]
With C++, it's not that bad to lex, but really hard to parse. Perl 
is both.
Maxim
26-Apr-2011
[1358]
XML schema validation process is an 80 page document guide and 80 
page reference.  it isn't quite as easy as the xml it is stored in.
Geomol
26-Apr-2011
[1359]
Ok, I mix lex and parse. I mean lexical analysis.
BrianH
26-Apr-2011
[1360x5]
XML and HTML are relatively easy to lex, and require Unicode support, 
so hand-written lexers are probably best. Schema validation is a 
diffferent issue.
REBOL is trickier to lex than to parse, but still in the middle of 
complexity overall.
Most generators seperate lexical analysis and parsing, but I've used 
ones that don't, like ANTLR and Coco/R. There are strengths to both 
approaches.
In answer to your comments link above:
- Syntax errors are triggered before semantic errors: 1.3, 11

- Words that start with + and - are special because of potential 
ambiguity with numbers: 1.1

- Arrows are only allowed in the special-case arrow words, not generally: 
1.2, 1.3, 4

- %: is ambiguous - it could be a file that wouldn't work on any 
OS, or the set-word form of %, so an error splits the difference: 
10.2
- Fixed already: 2.2 for arrows in R3, 7, 13


Some of the rest are related to http://issue.cc/r3/537and others 
have been reported already. If you want 10.2 to not trigger an error, 
it is more likely to be accepted as a set-word than a file. Thanks 
for these, particularly the lit-word bugs.
Also fixed already: 10.1 for ( ) [ ]