World: r3wp

Join the discussions in the REBOL3 world...

[Core] Discuss core issues

older newer	first last
Maxim 26-Apr-2011 [1315]	rename capabilities in file handling do not normally allow paths to be used (in the OS itself). otherwise these are a called 'move file operations. e.g. if you try using paths with rename in the DOS shell, you get errors.
BrianH 26-Apr-2011 [1316x3]	John, refinements that can't be translated to path use can be used for other reasons in other dialects. REBOL isn't just DO.
	When using refinements in dialect contexts where they will be translated from paths, it makes no sense to use them, but that is no reason to exclude them from the datatype. (That was the official decision for R3 when meijeru asked this question in CureCode here in mid-2009: http://issue.cc/r3/743)
	Sorry, the actual decision was in a different ticket, but the discussion was in #743. Sometimes it can be a problem to make multiple tickets for the same problem, as opposed to different parts of the same problem; it can get a little confusing. Stuff like this is why we rearrange tickets more now.
Geomol 26-Apr-2011 [1319]	It seems to me to be a sought for explanation of some inconsistency in the language. Also from the discussion in that ticket.
BrianH 26-Apr-2011 [1320]	Paths <> lists of refinements. It's inconsistent in inconsistent situations. Refinements are supposed to be useful for more than function specs.
Geomol 26-Apr-2011 [1321]	Sure, but do you believe, refinements like /1, /1a and /1.2 are made on purpose or just a side effect on how it's implemented?
BrianH 26-Apr-2011 [1322x2]	Refinements like those would be useful as keywords in dialects, for instance.
BrianH 26-Apr-2011 [1322x2]	There are a lot of values that can appear in paths that don't translate to refinements. Paths and refinements are only related in function call syntax; otherwise they are not related.
Geomol 26-Apr-2011 [1324]	I don't buy it, as I could just use #1, #1a and #1.2.
BrianH 26-Apr-2011 [1325]	There were a lot more tickets related to this, which are unfortunately difficult to search for because different people use different terminology for this problem, so they don't find the previous tickets. What I'm summarixing here is the final decision. I don't remember when that decision was made, but I remember the reasoning.
Geomol 26-Apr-2011 [1326]	ok
Maxim 26-Apr-2011 [1327x2]	john, have alternate datatypes for dialects is VERY good. issues have long been ignored becaue people forget they exist.
Maxim 26-Apr-2011 [1327x2]	have=having
Geomol 26-Apr-2011 [1329x2]	If this is the case, then I don't understand, why refinements don't just act like strings without spaces.
Geomol 26-Apr-2011 [1329x2]	Then you would be able to produce all kinds of refinements, like /:1
Maxim 26-Apr-2011 [1331x3]	it just depends how they're coded internally, I guess.
	its possible their container doesn't allow it.
	I also think that refinements are sort of meant to be path parts, so it makes sense to make them compatible with paths directly. though I guess one can give examples of path incompatible refinements.
Geomol 26-Apr-2011 [1334x3]	I still think, these refinements are not on purpose. :-) Check this part in the Core manual: http://www.rebol.com/docs/core23/rebolcore-16.html#section-3.7
	Max, there are many, like /(1)
	Anyway, it's good to be aware of these things, also for the programmers, who develop alternatives to REBOL.
BrianH 26-Apr-2011 [1337]	We are still making tickets related to word and refinement inconsistencies for R3 (or at least I am, when I find bugs in the syntax while I'm trying to reverse engineer the syntax docs). While the numeric refinement issue is settled, there are other issues that haven't yet been discovered. Most of the syntax problems are related to scanner precedence. All of the word and path datatypes can be constructed with characters/contents that don't really scan the same way in literal syntax, but it is not really considered an error. Datatypes are meant primarily for in-memory use - their syntax is secondary, and in many cases the literal syntax only covers a subset of the possible values.
Geomol 26-Apr-2011 [1338]	The original design of REBOL has many many great ideas. It's just the implementation, that isn't good enough in many cases. With these new explanations, the whole thing just get more complex, which isn't good. My view is, that it's better to stick with a simple design and work on getting that implemented.
BrianH 26-Apr-2011 [1339]	The problem is that believing the original design to be simple is what got us into this mess. The original design was an overview, and the details were unspecified. The overview was simple and still is, but now we are looking at, documenting and refining the details.
Geomol 26-Apr-2011 [1340]	For a scanner (also called lexical analyser), I can recommend studying the UNIX command lex. The code produced might be a bit bigger in size, but it's fast and produce good result.
BrianH 26-Apr-2011 [1341]	I've used lex, and many other compiler generators over the years, and yes, it's helped with the REBOL syntax documentation discovery. However, it might help you to know that the REBOL lexical analyzers and parsers are hand-written, not generated. This is why TRANSCODE in R3 and LOAD in R2 are so fast, but it is also why it is so tricky to resolve syntactic precedence bugs quickly.
Geomol 26-Apr-2011 [1342x2]	Yes, I've kinda guessed, it was hand-written. I allow me to doubt, it's faster, as actual measurements would be needed to compare.
Geomol 26-Apr-2011 [1342x2]	But it's most likely smaller in size, the code for the scanner.
BrianH 26-Apr-2011 [1344x2]	It used to be generated, but Carl says it's faster. I don't doubt him, because I've used dozens of parser generators before and that always seems to be the case. Sometimes you can get faster generated parsers, but generated lexers are usually slower because they're table-driven rather than code-driven. The advantage to generated lexers is that they are easier to write for complex lexical rules; for simple lexical rules, it is usually worth hand-coding.
BrianH 26-Apr-2011 [1344x2]	One of the tricks when refining the details is to realize that there is a real runtime difference between recommending that people not do something, and prohibiting something. Every time we prohibit something it has runtime overhead to enforce that prohibition. So every recommendation needs documenting and explaining, but every prohibition needs justifying. There are situational tradeoffs that recommendations can resolve easier than prohibitions. This is why we have to be extra careful about this.
Geomol 26-Apr-2011 [1346]	REBOL has 26 or so datatypes recognized by the scanner. That I would call complex lexical rules. Maybe a generated lexer will resolve many of the problems?
BrianH 26-Apr-2011 [1347x2]	Actually, that's still considered pretty simple. You still might need a DFA for some of the rules, but most of them can be recognized by hand-written code more efficiently. The problems are not caused by not using a generated lexer - even a generated lexer can have precedence errors. The real syntax bugs in R3 are there because noone has really gone through and figured out what they are, systematically; most of them are still undocumented. Recently, in my spare time, I've been trying to go through and document the syntax and ticket the bugs, so soon the limit will be developer time. (In R2, the bugs are there because the syntax is frozen for backwards compatibility.)
BrianH 26-Apr-2011 [1347x2]	As for the syntax-vs-memory data restrictions, it's another tradeoff. Regular REBOL syntax is much more limited than the full data model of REBOL, even if you include MOLD/all syntax, because the syntax was designed more for readability and writeability by humans. If we limit the data model to match the syntax, we limit our capabilities drastically. Limiting to the syntactic form only makes sense when you are serializing the data for storage or transport; in memory, it's unnecessary. A better solution is making a more comprehensive serialization format that doesn't have to be human readable - Rebin - and then using it when we need to serialize more of the in-memory data.
Geomol 26-Apr-2011 [1349]	I went through the scanner systematically 2 years ago, produced a document, which I sent to Carl. It's here: http://www.fys.ku.dk/~niclasen/rebol/rebol_scanner.html
BrianH 26-Apr-2011 [1350]	Cool, I'll take a look. I've been trying to generate compatible parsers in mezzanine PARSE code, which could then be translated to other parse models like syntax highlighters for editors when necessary. I'm hoping to make a module of rules that can be used by a wide variety of syntax analyzers.
Geomol 26-Apr-2011 [1351]	Actually, that's still considered pretty simple. Can you give examples of other lexers, that has to recognize more different tokens?
BrianH 26-Apr-2011 [1352]	C++ and Perl.
Maxim 26-Apr-2011 [1353]	if you include schema validation... I'd say XML is a nightmare :-)
Geomol 26-Apr-2011 [1354]	C++ hmm. Is that because you see each of the reserved keywords as a different token? I see all them as one.
BrianH 26-Apr-2011 [1355]	One of the interesting tradeoff tickets is http://issue.cc/r3/537 - I wrote up the ticket initially and expanded it to include all affected characters, but looking at it now I'd have to recommend that it be dismissed. If it is accepted it would have the side effect that more syntax would be accepted, but all of the newly accepted syntax would be hard to read. Accepting that ticket would make R3 more difficult to read, debug and maintain, so it's a bad tradeoff.
Geomol 26-Apr-2011 [1356]	XML is some of the simplest to parse, and I guess schema too.
BrianH 26-Apr-2011 [1357]	With C++, it's not that bad to lex, but really hard to parse. Perl is both.
Maxim 26-Apr-2011 [1358]	XML schema validation process is an 80 page document guide and 80 page reference. it isn't quite as easy as the xml it is stored in.
Geomol 26-Apr-2011 [1359]	Ok, I mix lex and parse. I mean lexical analysis.
BrianH 26-Apr-2011 [1360x5]	XML and HTML are relatively easy to lex, and require Unicode support, so hand-written lexers are probably best. Schema validation is a diffferent issue.
	REBOL is trickier to lex than to parse, but still in the middle of complexity overall.
	Most generators seperate lexical analysis and parsing, but I've used ones that don't, like ANTLR and Coco/R. There are strengths to both approaches.
	In answer to your comments link above: - Syntax errors are triggered before semantic errors: 1.3, 11 - Words that start with + and - are special because of potential ambiguity with numbers: 1.1 - Arrows are only allowed in the special-case arrow words, not generally: 1.2, 1.3, 4 - %: is ambiguous - it could be a file that wouldn't work on any OS, or the set-word form of %, so an error splits the difference: 10.2 - Fixed already: 2.2 for arrows in R3, 7, 13 Some of the rest are related to http://issue.cc/r3/537and others have been reported already. If you want 10.2 to not trigger an error, it is more likely to be accepted as a set-word than a file. Thanks for these, particularly the lit-word bugs.
	Also fixed already: 10.1 for ( ) [ ]
older newer	first last