r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[REBOL Syntax] Discussions about REBOL syntax

Maxim
23-Feb-2012
[324x3]
AFAICT  it's part of the datatype... since a space will go back and 
forth when you go to/from URL! and other types like string

(in R2 at least):
>> to-url "gogo://a.com/space here"
== gogo://a.com/space here
>> to-string gogo://a.com/space here
== "gogo://a.com/space here"
or did I get you wron?
wrong
Steeve
23-Feb-2012
[327]
Brian, Can you show me what is broken ? I'm a bit unsettled by your 
concern
BrianH
23-Feb-2012
[328x3]
The escape decoding gets done too early. The decoding should not 
be done after until the URI structure has been parsed. If you do 
the escape decoding too early, characters that are escaped so that 
they won't be treated as syntax characters (like /) are treated as 
syntax characters erroneously. This is a bad problem for schemes 
like HTTP or FTP that can use usernames and passwords, because the 
passwords in particular either get corrupted or have inappropriately 
restricted character sets. IDN encoding should be put off until the 
last minute too, once we add support for Unicode to the url handlers 
of HTTP, plus any others that should support that standard.
Given that the URI structure is parsed by DECODE-URL (or the R3 equivalent), 
that means that any unescaping should be done in that function, or 
in the scheme handler itself, not in the native code that runs before 
the mezzanine code is called.
Re-escaping in MOLD is OK though. It's the input that's the problem, 
not the output.
Maxim
23-Feb-2012
[331]
yep... and I've lost hours trying to get some ftp code to work because 
it had strange urls (with passwds)... which the interpreter would 
break all the time. 

At some point you are mystified by what is the actual URL being sent 
to the server.


once you see what is going on, you can get it to work, but realizing 
that you didn't actually send the url you expect, can take quite 
a long time to realize and properly fix once you've got a whole app 
expecting/playing with urls.
BrianH
23-Feb-2012
[332]
I've been hoping to fix that. I can load a hot-patch into R2, and 
include a patch in a host kit build in R3 or replace functions from 
%rebol.r if necessary.
Steeve
23-Feb-2012
[333x5]
Ok I try to resume our concern.

The url! and email! syntax is more permissive than a valid URI. It's 
not a problem nor a design flaw.

The escape decoding should not be done at all when decoded as a part 
of an url! or email!. Right, but it will not be corrected until Carl 
does it.

DECODE-URL can be rewritten (used by schemes). The parser is too 
strict and can't deal with complex forms.
Lot of inconsistencies with file! datatype between R2 and R3.
Escaping notation = huge mess
you can use 2 forms for file! :
in R2
- %"*"  quoted sting file, with ^ escape notation allowed
- %*  Form  with %ff escape notation allowed  
in R3
- quoted string file works fine

- in the %* form, the % escape notation works fine but the ^ char 
mess up  things in some cases without issuing an error
In the %* form, R3 should recognise the ^ char as a normal char (not 
one escaping notation) as R2 does.
So for the moment; I think it's better to reject the ^ char in the 
R3 syntax
Maxim
23-Feb-2012
[338]
yeah, its surely some left over copy/paste code from the string loader, 
left in the file loader by error.
BrianH
23-Feb-2012
[339x3]
Worse than being a huge mess, R2 and R3 have different messes. R2 
MOLD fails to encode the % character properly. R3 chokes on the ^ 
character in unquoted mode, and allows both ^ and % escaping in quoted 
mode, and MOLDs the ^ character without encoding it (a problem because 
it chokes on that character). Overall the R2 MOLD problem is worse 
than all of the R3 problems put together because % is a more common 
character in filenames than ^, but both need fixing. I wish it just 
did one escaping method for files, % escaping, or did only % escaping 
for unquoted files and only ^ escaping for quoted files. % escaping 
doesn't support Unicode characters over 255, but no characters like 
that need to be escaped anyways - they can be written directly.
R2 file! syntax may have more problems that I'm not aware of though.
I guess that I just want the escaping behavior Steeve described for 
R2, but with the MOLD of %%25 fix from R3, along with % by itself 
being interpreted as and molding as %"".
Steeve
24-Feb-2012
[342x4]
file-char: complement union charset {%:@} termination-char
file-char/#"/": true	;** #"/" added
file-syntax: [
	#"%" [
		quoted-string
		| any [file-char | escape-uri] ;** fail on ^ char
	] termination
]
alternative-syntax R2 file-syntax: [
	#"%" [
		quoted-string
		| some [file-char | escape-uri | #"^^"]  ;** ^ valid char
	] termination
]
Missing rules...
path! refinement! date! time! 
Anything else ???
pair!
Sources
https://github.com/rebolsource/rebol-syntax
Maxim
24-Feb-2012
[346]
I don't see to recognise the serialized version of the few datatypes 
which have it...
 #[true] #[false] #[none] [#function [][] ]  [#object [] ]
Steeve
24-Feb-2012
[347]
yep
Cyphre
24-Feb-2012
[348]
image!
Maxim
24-Feb-2012
[349]
#[list![]]  #[hash![]]
Steeve
24-Feb-2012
[350]
Okkkkk, there is a huge list for the serialized ones ;-)
Maxim
24-Feb-2012
[351]
money!    1.00%
Steeve
24-Feb-2012
[352]
percent!
Cyphre
24-Feb-2012
[353x2]
date! time!...
bitset!...
Maxim
24-Feb-2012
[355]
path! set-path! lit-path!
Steeve
24-Feb-2012
[356]
Well...
Cyphre
24-Feb-2012
[357]
just write: ? datatype! in the console to get some list
Steeve
24-Feb-2012
[358]
I will focus on the annoying ones for now
Maxim
24-Feb-2012
[359]
date has many variations, its probably the more complex one left
Steeve
24-Feb-2012
[360]
yep
Maxim
24-Feb-2012
[361]
actually,  path! also has a few quirks, like allowing parens and 
the use of a  get-set-word at the end
Steeve
24-Feb-2012
[362]
but path! needs all the other dataypes to be finished first
Maxim
24-Feb-2012
[363x2]
no, afaik,  just paren!, word and its own additional format quirks. 
    as the global block definition expacts, so too will parens, and 
thus the path.
expacts... expands
Steeve
24-Feb-2012
[365]
So, path! is not complex in that regard (values separated by '/')
Maxim
24-Feb-2012
[366]
yeah, just have to find the values which are valid in a path (not 
all types are valid, at least in R2)
Steeve
24-Feb-2012
[367]
Agreed
Maxim
24-Feb-2012
[368x2]
tag! yes, string! no.   for example:

>> block: [55 "abc" [test] <tag> [test]]
== [55 "abc" [test] <tag> [test2]]
>> block/"abc"
** Syntax Error: Invalid path -- block/
** Near: (line 1) block/"abc"
>> block/("abc")
== [test]
>> block/<tag>
== [test2]
I bet you didn't know tags where usable directly  ?  not many think 
about it, but since tags are strings, they make a lot of sense for 
representing XML tree structures... and indeed, I used them when 
I had namespaced tags.
Steeve
24-Feb-2012
[370]
Sorry, I knew ;-)
Maxim
24-Feb-2012
[371x2]
hehe.  but it may adds another complexity to the  <  parsing rule 
maybe some precedende in the rule will be required to make sure the 
this/<tag> isn't short-circuited by another simpler rule.
maybe some precedende in the rule == .  Maybe some precedende manipulations 
in the rules
Steeve
24-Feb-2012
[373]
Ok, I will go first with time! because date! needs it