World: r3wp
[REBOL Syntax] Discussions about REBOL syntax
older newer | first last |
Maxim 23-Feb-2012 [324x3] | AFAICT it's part of the datatype... since a space will go back and forth when you go to/from URL! and other types like string (in R2 at least): >> to-url "gogo://a.com/space here" == gogo://a.com/space here >> to-string gogo://a.com/space here == "gogo://a.com/space here" |
or did I get you wron? | |
wrong | |
Steeve 23-Feb-2012 [327] | Brian, Can you show me what is broken ? I'm a bit unsettled by your concern |
BrianH 23-Feb-2012 [328x3] | The escape decoding gets done too early. The decoding should not be done after until the URI structure has been parsed. If you do the escape decoding too early, characters that are escaped so that they won't be treated as syntax characters (like /) are treated as syntax characters erroneously. This is a bad problem for schemes like HTTP or FTP that can use usernames and passwords, because the passwords in particular either get corrupted or have inappropriately restricted character sets. IDN encoding should be put off until the last minute too, once we add support for Unicode to the url handlers of HTTP, plus any others that should support that standard. |
Given that the URI structure is parsed by DECODE-URL (or the R3 equivalent), that means that any unescaping should be done in that function, or in the scheme handler itself, not in the native code that runs before the mezzanine code is called. | |
Re-escaping in MOLD is OK though. It's the input that's the problem, not the output. | |
Maxim 23-Feb-2012 [331] | yep... and I've lost hours trying to get some ftp code to work because it had strange urls (with passwds)... which the interpreter would break all the time. At some point you are mystified by what is the actual URL being sent to the server. once you see what is going on, you can get it to work, but realizing that you didn't actually send the url you expect, can take quite a long time to realize and properly fix once you've got a whole app expecting/playing with urls. |
BrianH 23-Feb-2012 [332] | I've been hoping to fix that. I can load a hot-patch into R2, and include a patch in a host kit build in R3 or replace functions from %rebol.r if necessary. |
Steeve 23-Feb-2012 [333x5] | Ok I try to resume our concern. The url! and email! syntax is more permissive than a valid URI. It's not a problem nor a design flaw. The escape decoding should not be done at all when decoded as a part of an url! or email!. Right, but it will not be corrected until Carl does it. DECODE-URL can be rewritten (used by schemes). The parser is too strict and can't deal with complex forms. |
Lot of inconsistencies with file! datatype between R2 and R3. Escaping notation = huge mess | |
you can use 2 forms for file! : in R2 - %"*" quoted sting file, with ^ escape notation allowed - %* Form with %ff escape notation allowed in R3 - quoted string file works fine - in the %* form, the % escape notation works fine but the ^ char mess up things in some cases without issuing an error | |
In the %* form, R3 should recognise the ^ char as a normal char (not one escaping notation) as R2 does. | |
So for the moment; I think it's better to reject the ^ char in the R3 syntax | |
Maxim 23-Feb-2012 [338] | yeah, its surely some left over copy/paste code from the string loader, left in the file loader by error. |
BrianH 23-Feb-2012 [339x3] | Worse than being a huge mess, R2 and R3 have different messes. R2 MOLD fails to encode the % character properly. R3 chokes on the ^ character in unquoted mode, and allows both ^ and % escaping in quoted mode, and MOLDs the ^ character without encoding it (a problem because it chokes on that character). Overall the R2 MOLD problem is worse than all of the R3 problems put together because % is a more common character in filenames than ^, but both need fixing. I wish it just did one escaping method for files, % escaping, or did only % escaping for unquoted files and only ^ escaping for quoted files. % escaping doesn't support Unicode characters over 255, but no characters like that need to be escaped anyways - they can be written directly. |
R2 file! syntax may have more problems that I'm not aware of though. | |
I guess that I just want the escaping behavior Steeve described for R2, but with the MOLD of %%25 fix from R3, along with % by itself being interpreted as and molding as %"". | |
Steeve 24-Feb-2012 [342x4] | file-char: complement union charset {%:@} termination-char file-char/#"/": true ;** #"/" added file-syntax: [ #"%" [ quoted-string | any [file-char | escape-uri] ;** fail on ^ char ] termination ] alternative-syntax R2 file-syntax: [ #"%" [ quoted-string | some [file-char | escape-uri | #"^^"] ;** ^ valid char ] termination ] |
Missing rules... path! refinement! date! time! Anything else ??? | |
pair! | |
Sources https://github.com/rebolsource/rebol-syntax | |
Maxim 24-Feb-2012 [346] | I don't see to recognise the serialized version of the few datatypes which have it... #[true] #[false] #[none] [#function [][] ] [#object [] ] |
Steeve 24-Feb-2012 [347] | yep |
Cyphre 24-Feb-2012 [348] | image! |
Maxim 24-Feb-2012 [349] | #[list![]] #[hash![]] |
Steeve 24-Feb-2012 [350] | Okkkkk, there is a huge list for the serialized ones ;-) |
Maxim 24-Feb-2012 [351] | money! 1.00% |
Steeve 24-Feb-2012 [352] | percent! |
Cyphre 24-Feb-2012 [353x2] | date! time!... |
bitset!... | |
Maxim 24-Feb-2012 [355] | path! set-path! lit-path! |
Steeve 24-Feb-2012 [356] | Well... |
Cyphre 24-Feb-2012 [357] | just write: ? datatype! in the console to get some list |
Steeve 24-Feb-2012 [358] | I will focus on the annoying ones for now |
Maxim 24-Feb-2012 [359] | date has many variations, its probably the more complex one left |
Steeve 24-Feb-2012 [360] | yep |
Maxim 24-Feb-2012 [361] | actually, path! also has a few quirks, like allowing parens and the use of a get-set-word at the end |
Steeve 24-Feb-2012 [362] | but path! needs all the other dataypes to be finished first |
Maxim 24-Feb-2012 [363x2] | no, afaik, just paren!, word and its own additional format quirks. as the global block definition expacts, so too will parens, and thus the path. |
expacts... expands | |
Steeve 24-Feb-2012 [365] | So, path! is not complex in that regard (values separated by '/') |
Maxim 24-Feb-2012 [366] | yeah, just have to find the values which are valid in a path (not all types are valid, at least in R2) |
Steeve 24-Feb-2012 [367] | Agreed |
Maxim 24-Feb-2012 [368x2] | tag! yes, string! no. for example: >> block: [55 "abc" [test] <tag> [test]] == [55 "abc" [test] <tag> [test2]] >> block/"abc" ** Syntax Error: Invalid path -- block/ ** Near: (line 1) block/"abc" >> block/("abc") == [test] >> block/<tag> == [test2] |
I bet you didn't know tags where usable directly ? not many think about it, but since tags are strings, they make a lot of sense for representing XML tree structures... and indeed, I used them when I had namespaced tags. | |
Steeve 24-Feb-2012 [370] | Sorry, I knew ;-) |
Maxim 24-Feb-2012 [371x2] | hehe. but it may adds another complexity to the < parsing rule maybe some precedende in the rule will be required to make sure the this/<tag> isn't short-circuited by another simpler rule. |
maybe some precedende in the rule == . Maybe some precedende manipulations in the rules | |
Steeve 24-Feb-2012 [373] | Ok, I will go first with time! because date! needs it |
older newer | first last |