r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Core] Discuss core issues

BrianH
26-Apr-2011
[1339]
The problem is that believing the original design to be simple is 
what got us into this mess. The original design was an overview, 
and the details were unspecified. The overview was simple and still 
is, but now we are looking at, documenting and refining the details.
Geomol
26-Apr-2011
[1340]
For a scanner (also called lexical analyser), I can recommend studying 
the UNIX command lex. The code produced might be a bit bigger in 
size, but it's fast and produce good result.
BrianH
26-Apr-2011
[1341]
I've used lex, and many other compiler generators over the years, 
and yes, it's helped with the REBOL syntax documentation discovery. 
However, it might help you to know that the REBOL lexical analyzers 
and parsers are hand-written, not generated. This is why TRANSCODE 
in R3 and LOAD in R2 are so fast, but it is also why it is so tricky 
to resolve syntactic precedence bugs quickly.
Geomol
26-Apr-2011
[1342x2]
Yes, I've kinda guessed, it was hand-written. I allow me to doubt, 
it's faster, as actual measurements would be needed to compare.
But it's most likely smaller in size, the code for the scanner.
BrianH
26-Apr-2011
[1344x2]
It used to be generated, but Carl says it's faster. I don't doubt 
him, because I've used dozens of parser generators before and that 
always seems to be the case. Sometimes you can get faster generated 
parsers, but generated lexers are usually slower because they're 
table-driven rather than code-driven. The advantage to generated 
lexers is that they are easier to write for complex lexical rules; 
for simple lexical rules, it is usually worth hand-coding.
One of the tricks when refining the details is to realize that there 
is a real runtime difference between recommending that people not 
do something, and prohibiting something. Every time we prohibit something 
it has runtime overhead to enforce that prohibition. So every recommendation 
needs documenting and explaining, but every prohibition needs justifying. 
There are situational tradeoffs that recommendations can resolve 
easier than prohibitions. This is why we have to be extra careful 
about this.
Geomol
26-Apr-2011
[1346]
REBOL has 26 or so datatypes recognized by the scanner. That I would 
call complex lexical rules. Maybe a generated lexer will resolve 
many of the problems?
BrianH
26-Apr-2011
[1347x2]
Actually, that's still considered pretty simple. You still might 
need a DFA for some of the rules, but most of them can be recognized 
by hand-written code more efficiently. The problems are not caused 
by not using a generated lexer - even a generated lexer can have 
precedence errors. The real syntax bugs in R3 are there because noone 
has really gone through and figured out what they are, systematically; 
most of them are still undocumented. Recently, in my spare time, 
I've been trying to go through and document the syntax and ticket 
the bugs, so soon the limit will be developer time. (In R2, the bugs 
are there because the syntax is frozen for backwards compatibility.)
As for the syntax-vs-memory data restrictions, it's another tradeoff. 
Regular REBOL syntax is much more limited than the full data model 
of REBOL, even if you include MOLD/all syntax, because the syntax 
was designed more for readability and writeability by humans. If 
we limit the data model to match the syntax, we limit our capabilities 
drastically. Limiting to the syntactic form only makes sense when 
you are serializing the data for storage or transport; in memory, 
it's unnecessary. A better solution is making a more comprehensive 
serialization format that doesn't have to be human readable - Rebin 
- and then using it when we need to serialize more of the in-memory 
data.
Geomol
26-Apr-2011
[1349]
I went through the scanner systematically 2 years ago, produced a 
document, which I sent to Carl. It's here:
http://www.fys.ku.dk/~niclasen/rebol/rebol_scanner.html
BrianH
26-Apr-2011
[1350]
Cool, I'll take a look. I've been trying to generate compatible parsers 
in mezzanine PARSE code, which could then be translated to other 
parse models like syntax highlighters for editors when necessary. 
I'm hoping to make a module of rules that can be used by a wide variety 
of syntax analyzers.
Geomol
26-Apr-2011
[1351]
Actually, that's still considered pretty simple.


Can you give examples of other lexers, that has to recognize more 
different tokens?
BrianH
26-Apr-2011
[1352]
C++ and Perl.
Maxim
26-Apr-2011
[1353]
if you include schema validation... I'd say XML is a nightmare  :-)
Geomol
26-Apr-2011
[1354]
C++ hmm. Is that because you see each of the reserved keywords as 
a different token? I see all them as one.
BrianH
26-Apr-2011
[1355]
One of the interesting tradeoff tickets is http://issue.cc/r3/537
- I wrote up the ticket initially and expanded it to include all 
affected characters, but looking at it now I'd have to recommend 
that it be dismissed. If it is accepted it would have the side effect 
that more syntax would be accepted, but all of the newly accepted 
syntax would be hard to read. Accepting that ticket would make R3 
more difficult to read, debug and maintain, so it's a bad tradeoff.
Geomol
26-Apr-2011
[1356]
XML is some of the simplest to parse, and I guess schema too.
BrianH
26-Apr-2011
[1357]
With C++, it's not that bad to lex, but really hard to parse. Perl 
is both.
Maxim
26-Apr-2011
[1358]
XML schema validation process is an 80 page document guide and 80 
page reference.  it isn't quite as easy as the xml it is stored in.
Geomol
26-Apr-2011
[1359]
Ok, I mix lex and parse. I mean lexical analysis.
BrianH
26-Apr-2011
[1360x6]
XML and HTML are relatively easy to lex, and require Unicode support, 
so hand-written lexers are probably best. Schema validation is a 
diffferent issue.
REBOL is trickier to lex than to parse, but still in the middle of 
complexity overall.
Most generators seperate lexical analysis and parsing, but I've used 
ones that don't, like ANTLR and Coco/R. There are strengths to both 
approaches.
In answer to your comments link above:
- Syntax errors are triggered before semantic errors: 1.3, 11

- Words that start with + and - are special because of potential 
ambiguity with numbers: 1.1

- Arrows are only allowed in the special-case arrow words, not generally: 
1.2, 1.3, 4

- %: is ambiguous - it could be a file that wouldn't work on any 
OS, or the set-word form of %, so an error splits the difference: 
10.2
- Fixed already: 2.2 for arrows in R3, 7, 13


Some of the rest are related to http://issue.cc/r3/537and others 
have been reported already. If you want 10.2 to not trigger an error, 
it is more likely to be accepted as a set-word than a file. Thanks 
for these, particularly the lit-word bugs.
Also fixed already: 10.1 for ( ) [ ]
Never mind about the 10.2 stuff: For some reason I forgot that % 
wasn't a modulus operator :(
Geomol
1-May-2011
[1366]
If I in a function have a local variable, v, but I want the value 
of a variable v in the context outside the function, I can write:

get bind 'v bound? 'f


, where f is the name of the function. Is that the way to do it, 
or is there a better way? Full example:

>> v: 1
== 1
>> f: func [/local v] [v: 2 get bind 'v bound? 'f]      
>> f
== 1
Ladislav
1-May-2011
[1367x2]
Is that the way to do it
 - I guess not, there is a more efficient way
If you know the context you want to use and it is always the same, 
then it is a bit inefficient to call the BIND function, not to mention, 
that

    bind 'v 'f

is more efficient than
 
    bind 'v bound? 'f
Geomol
1-May-2011
[1369x2]
Thanks!
It's for the parse function, I'm working on, and I want to be sure, 
I don't get a local var, if vars are used in the parse rules.
Maxim
1-May-2011
[1371]
if the parse rule is given as a parameter, vars within the rule will 
not be bound to the function.  the binding is static, i.e. it occurs 
only once, when the function is created.  the word in the parse, 
already is bound (or not).
Geomol
1-May-2011
[1372]
Ah yes, thanks.
Geomol
9-May-2011
[1373]
Tonights moment of REBOL ZEN:

>> f: func [/ref x] [print [ref x]]
>> f/ref/ref 1 2
true 2
onetom
9-May-2011
[1374]
ahhha...
>>  f: func [/ref x y] [print [ref x y]]   f/ref/ref 1 2 3 4
true 3 4
PeterWood
9-May-2011
[1375]
Can somebody confirm if the following crashes 2.7.8 on their machine

>> -1 * -2147483648
Sunanda
10-May-2011
[1376]
Crashes under Windows here,
Nice catch!
PeterWood
10-May-2011
[1377x2]
I was also running under Windows.
Crashes under OS X too:

>>  -1 * -2147483648
Floating point exception
BrianH
10-May-2011
[1379x2]
Geomol, that's something I've never seen anyone do in REBOL before. 
The discarded arguments are even evaluated properly and typechecked.
Works in R3 as well.
Ladislav
10-May-2011
[1381]
An old one, Peter. It is in %core-tests.r
PeterWood
10-May-2011
[1382]
Do  you know if it is RAMBO as I guess Carl doesn't take much interest 
in %core-tests.r ?
Ladislav
10-May-2011
[1383]
#4229
PeterWood
10-May-2011
[1384]
Thanks Ladislav.
Geomol
10-May-2011
[1385x2]
Tonight's Moment of REBOL Zen:

Check this Fibonacci function:

fib: func [
	n [integer!]
	(
		if not local [
			a: 0
			b: 1
		]
		prin [a b ""]
		loop n [
			prin [c: a + b ""]
			a: b
			b: c
		]
		print ""
	)
	/local a [integer!] b [integer!] c
][
	do bind third :fib 'a
]

>> fib 10
0 1 1 2 3 5 8 13 21 34 55 89 
== 89
>> fib/local 10 55 89 none
55 89 144 233 377 610 987 1597 2584 4181 6765 10946 
== 10946


If you only want to execute the paren in the function spec, put this 
in the body instead:

	do bind to block! third second load mold :fib 'a
A more simple example of this weird function construction:


>> hello-world: func [(print "Hello, World!")] [do third :hello-world] 
    
>> hello-world
Hello, World!
onetom
10-May-2011
[1387]
moebius function. its body bends and bites back into its own spec 
:)
Maxim
10-May-2011
[1388]
can anyone confirm that 'CALL on windows 7 is unable to launch apps 
without using the /show refinement.... which is VERY annoying.

it seems that call/shell no longer works.