World: r3wp

Join the discussions in the REBOL3 world...

[Core] Discuss core issues

older newer	first last
BrianH 26-Apr-2011 [1339]	The problem is that believing the original design to be simple is what got us into this mess. The original design was an overview, and the details were unspecified. The overview was simple and still is, but now we are looking at, documenting and refining the details.
Geomol 26-Apr-2011 [1340]	For a scanner (also called lexical analyser), I can recommend studying the UNIX command lex. The code produced might be a bit bigger in size, but it's fast and produce good result.
BrianH 26-Apr-2011 [1341]	I've used lex, and many other compiler generators over the years, and yes, it's helped with the REBOL syntax documentation discovery. However, it might help you to know that the REBOL lexical analyzers and parsers are hand-written, not generated. This is why TRANSCODE in R3 and LOAD in R2 are so fast, but it is also why it is so tricky to resolve syntactic precedence bugs quickly.
Geomol 26-Apr-2011 [1342x2]	Yes, I've kinda guessed, it was hand-written. I allow me to doubt, it's faster, as actual measurements would be needed to compare.
Geomol 26-Apr-2011 [1342x2]	But it's most likely smaller in size, the code for the scanner.
BrianH 26-Apr-2011 [1344x2]	It used to be generated, but Carl says it's faster. I don't doubt him, because I've used dozens of parser generators before and that always seems to be the case. Sometimes you can get faster generated parsers, but generated lexers are usually slower because they're table-driven rather than code-driven. The advantage to generated lexers is that they are easier to write for complex lexical rules; for simple lexical rules, it is usually worth hand-coding.
BrianH 26-Apr-2011 [1344x2]	One of the tricks when refining the details is to realize that there is a real runtime difference between recommending that people not do something, and prohibiting something. Every time we prohibit something it has runtime overhead to enforce that prohibition. So every recommendation needs documenting and explaining, but every prohibition needs justifying. There are situational tradeoffs that recommendations can resolve easier than prohibitions. This is why we have to be extra careful about this.
Geomol 26-Apr-2011 [1346]	REBOL has 26 or so datatypes recognized by the scanner. That I would call complex lexical rules. Maybe a generated lexer will resolve many of the problems?
BrianH 26-Apr-2011 [1347x2]	Actually, that's still considered pretty simple. You still might need a DFA for some of the rules, but most of them can be recognized by hand-written code more efficiently. The problems are not caused by not using a generated lexer - even a generated lexer can have precedence errors. The real syntax bugs in R3 are there because noone has really gone through and figured out what they are, systematically; most of them are still undocumented. Recently, in my spare time, I've been trying to go through and document the syntax and ticket the bugs, so soon the limit will be developer time. (In R2, the bugs are there because the syntax is frozen for backwards compatibility.)
BrianH 26-Apr-2011 [1347x2]	As for the syntax-vs-memory data restrictions, it's another tradeoff. Regular REBOL syntax is much more limited than the full data model of REBOL, even if you include MOLD/all syntax, because the syntax was designed more for readability and writeability by humans. If we limit the data model to match the syntax, we limit our capabilities drastically. Limiting to the syntactic form only makes sense when you are serializing the data for storage or transport; in memory, it's unnecessary. A better solution is making a more comprehensive serialization format that doesn't have to be human readable - Rebin - and then using it when we need to serialize more of the in-memory data.
Geomol 26-Apr-2011 [1349]	I went through the scanner systematically 2 years ago, produced a document, which I sent to Carl. It's here: http://www.fys.ku.dk/~niclasen/rebol/rebol_scanner.html
BrianH 26-Apr-2011 [1350]	Cool, I'll take a look. I've been trying to generate compatible parsers in mezzanine PARSE code, which could then be translated to other parse models like syntax highlighters for editors when necessary. I'm hoping to make a module of rules that can be used by a wide variety of syntax analyzers.
Geomol 26-Apr-2011 [1351]	Actually, that's still considered pretty simple. Can you give examples of other lexers, that has to recognize more different tokens?
BrianH 26-Apr-2011 [1352]	C++ and Perl.
Maxim 26-Apr-2011 [1353]	if you include schema validation... I'd say XML is a nightmare :-)
Geomol 26-Apr-2011 [1354]	C++ hmm. Is that because you see each of the reserved keywords as a different token? I see all them as one.
BrianH 26-Apr-2011 [1355]	One of the interesting tradeoff tickets is http://issue.cc/r3/537 - I wrote up the ticket initially and expanded it to include all affected characters, but looking at it now I'd have to recommend that it be dismissed. If it is accepted it would have the side effect that more syntax would be accepted, but all of the newly accepted syntax would be hard to read. Accepting that ticket would make R3 more difficult to read, debug and maintain, so it's a bad tradeoff.
Geomol 26-Apr-2011 [1356]	XML is some of the simplest to parse, and I guess schema too.
BrianH 26-Apr-2011 [1357]	With C++, it's not that bad to lex, but really hard to parse. Perl is both.
Maxim 26-Apr-2011 [1358]	XML schema validation process is an 80 page document guide and 80 page reference. it isn't quite as easy as the xml it is stored in.
Geomol 26-Apr-2011 [1359]	Ok, I mix lex and parse. I mean lexical analysis.
BrianH 26-Apr-2011 [1360x6]	XML and HTML are relatively easy to lex, and require Unicode support, so hand-written lexers are probably best. Schema validation is a diffferent issue.
	REBOL is trickier to lex than to parse, but still in the middle of complexity overall.
	Most generators seperate lexical analysis and parsing, but I've used ones that don't, like ANTLR and Coco/R. There are strengths to both approaches.
	In answer to your comments link above: - Syntax errors are triggered before semantic errors: 1.3, 11 - Words that start with + and - are special because of potential ambiguity with numbers: 1.1 - Arrows are only allowed in the special-case arrow words, not generally: 1.2, 1.3, 4 - %: is ambiguous - it could be a file that wouldn't work on any OS, or the set-word form of %, so an error splits the difference: 10.2 - Fixed already: 2.2 for arrows in R3, 7, 13 Some of the rest are related to http://issue.cc/r3/537and others have been reported already. If you want 10.2 to not trigger an error, it is more likely to be accepted as a set-word than a file. Thanks for these, particularly the lit-word bugs.
	Also fixed already: 10.1 for ( ) [ ]
	Never mind about the 10.2 stuff: For some reason I forgot that % wasn't a modulus operator :(
Geomol 1-May-2011 [1366]	If I in a function have a local variable, v, but I want the value of a variable v in the context outside the function, I can write: get bind 'v bound? 'f , where f is the name of the function. Is that the way to do it, or is there a better way? Full example: >> v: 1 == 1 >> f: func [/local v] [v: 2 get bind 'v bound? 'f] >> f == 1
Ladislav 1-May-2011 [1367x2]	Is that the way to do it - I guess not, there is a more efficient way
Ladislav 1-May-2011 [1367x2]	If you know the context you want to use and it is always the same, then it is a bit inefficient to call the BIND function, not to mention, that bind 'v 'f is more efficient than bind 'v bound? 'f
Geomol 1-May-2011 [1369x2]	Thanks!
Geomol 1-May-2011 [1369x2]	It's for the parse function, I'm working on, and I want to be sure, I don't get a local var, if vars are used in the parse rules.
Maxim 1-May-2011 [1371]	if the parse rule is given as a parameter, vars within the rule will not be bound to the function. the binding is static, i.e. it occurs only once, when the function is created. the word in the parse, already is bound (or not).
Geomol 1-May-2011 [1372]	Ah yes, thanks.
Geomol 9-May-2011 [1373]	Tonights moment of REBOL ZEN: >> f: func [/ref x] [print [ref x]] >> f/ref/ref 1 2 true 2
onetom 9-May-2011 [1374]	ahhha... >> f: func [/ref x y] [print [ref x y]] f/ref/ref 1 2 3 4 true 3 4
PeterWood 9-May-2011 [1375]	Can somebody confirm if the following crashes 2.7.8 on their machine >> -1 * -2147483648
Sunanda 10-May-2011 [1376]	Crashes under Windows here, Nice catch!
PeterWood 10-May-2011 [1377x2]	I was also running under Windows.
PeterWood 10-May-2011 [1377x2]	Crashes under OS X too: >> -1 * -2147483648 Floating point exception
BrianH 10-May-2011 [1379x2]	Geomol, that's something I've never seen anyone do in REBOL before. The discarded arguments are even evaluated properly and typechecked.
BrianH 10-May-2011 [1379x2]	Works in R3 as well.
Ladislav 10-May-2011 [1381]	An old one, Peter. It is in %core-tests.r
PeterWood 10-May-2011 [1382]	Do you know if it is RAMBO as I guess Carl doesn't take much interest in %core-tests.r ?
Ladislav 10-May-2011 [1383]	#4229
PeterWood 10-May-2011 [1384]	Thanks Ladislav.
Geomol 10-May-2011 [1385x2]	Tonight's Moment of REBOL Zen: Check this Fibonacci function: fib: func [ n [integer!] ( if not local [ a: 0 b: 1 ] prin [a b ""] loop n [ prin [c: a + b ""] a: b b: c ] print "" ) /local a [integer!] b [integer!] c ][ do bind third :fib 'a ] >> fib 10 0 1 1 2 3 5 8 13 21 34 55 89 == 89 >> fib/local 10 55 89 none 55 89 144 233 377 610 987 1597 2584 4181 6765 10946 == 10946 If you only want to execute the paren in the function spec, put this in the body instead: do bind to block! third second load mold :fib 'a
Geomol 10-May-2011 [1385x2]	A more simple example of this weird function construction: >> hello-world: func [(print "Hello, World!")] [do third :hello-world] >> hello-world Hello, World!
onetom 10-May-2011 [1387]	moebius function. its body bends and bites back into its own spec :)
Maxim 10-May-2011 [1388]	can anyone confirm that 'CALL on windows 7 is unable to launch apps without using the /show refinement.... which is VERY annoying. it seems that call/shell no longer works.
older newer	first last