r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[!REBOL2 Releases] Discuss 2.x releases

Graham
2-Sep-2010
[2016]
parse-url should not dehex .. we can fix the rest in the schemes
Maxim
2-Sep-2010
[2017x2]
brian, true.  my error... I'm deep in calculus... my brain is a bit 
mushy ;-)


IIRC the RFC has an BNF-style breakdown, so there should be no surprise 
as to where hexing can and should be interpreted.
I did a fully compliant RFC URL parser which works better than the 
internal one... maybe I should look at it for more details...
BrianH
2-Sep-2010
[2019]
My only concern is that I don't know where is the code that reassembles 
the url! from the results of DECODE-URL. So I don't know how to fix 
any issues in it.
Graham
2-Sep-2010
[2020]
the code is in the sdk
BrianH
2-Sep-2010
[2021]
Which file? What is the function named?
Graham
2-Sep-2010
[2022]
Why do you think it is reassembled?
Maxim
2-Sep-2010
[2023]
IIRC its all in a context.
Graham
2-Sep-2010
[2024]
All urls are deconstructed into an object
BrianH
2-Sep-2010
[2025x3]
Right. But the problem I am trying to fix is this:
>> http://user%40rebol.com:[blah-:-www-:-rebol-:-com]/
== http://[user-:-rebol-:-com]:[blah-:-www-:-rebol-:-com]/
Somewhere in the middle of that process, DECODE-URL is called. What 
is called after that to reassemble the result into a url! value?
The dehex in that process is the one that we need to get rid of.
Graham
2-Sep-2010
[2028]
I suspect we don't have the source to that...
Maxim
2-Sep-2010
[2029]
look in the URL-parser context within the prot-utils.r file.

that is where the url decoding occurs.
Graham
2-Sep-2010
[2030]
There's no 'dehex there
Maxim
2-Sep-2010
[2031]
but I remember having the same issue a while back and traced it to 
the actual datatype always handling the hex values.
Graham
2-Sep-2010
[2032x2]
I don't think it matters
yes, Max ..
Maxim
2-Sep-2010
[2034]
just using the to-url created the same headaches... IIRC
Graham
2-Sep-2010
[2035]
Brian's issue is not a problem
BrianH
2-Sep-2010
[2036]
That is exactly where it matters. That is the whole problem.
Graham
2-Sep-2010
[2037]
We can fix it without worrying about that part
BrianH
2-Sep-2010
[2038]
That part is the only part that needs fixing.
Maxim
2-Sep-2010
[2039x2]
brian .. I agree.. the hexing should stay in the url datatype until 
the actual network scheme requires to handle it.  % characters are 
valid url so they should not get "fixed"
%XX  that is.
Graham
2-Sep-2010
[2041]
But you can't fix it because it's the way Rebol evaluates datatypes 
.. only Carl can fix that.
Maxim
2-Sep-2010
[2042]
exactly.
BrianH
2-Sep-2010
[2043]
Since when is that a constraint?
Graham
2-Sep-2010
[2044]
12 years I think now
BrianH
2-Sep-2010
[2045]
Problems that only Carl can fix still need fixing.
Maxim
2-Sep-2010
[2046x3]
so its a limitation in the URL datatype... akin to agressive error 
evaluation.
so in REBOL speak, url dehexing should, be "relaxed" :-)
in my app, I ended up doing all URL manipulation in strings, and 
then just converting to url at the time of network call
BrianH
2-Sep-2010
[2049]
Or at least put off until it is appropriate to do.
Maxim
2-Sep-2010
[2050x2]
IMHO the datatype can't know when. only the schemes and url processors 
know "when" is appropriate.
the problem is when we are programatically managing uri.  the datatype 
dehexing really gets in the way.
Graham
2-Sep-2010
[2052]
>> http://user%40rebol.com:[blah-:-www-:-rebol-:-com]/
== http://[user-:-rebol-:-com]:[blah-:-www-:-rebol-:-com]/
>> a: to-url "http://user%40rebol.com:[blah-:-www-:-rebol-:-com]"
== http://user%40rebol.com:[blah-:-www-:-rebol-:-com]
>> a
== http://user%40rebol.com:[blah-:-www-:-rebol-:-com]
BrianH
2-Sep-2010
[2053x2]
It's really simple: The url! datatype should do no dehexing itself. 
The file! datatype can dehex, but not url!. Dehexing is only safe 
after decoding.
Graham, thanks for narrowing it down.
Maxim
2-Sep-2010
[2055x2]
just tried a read, and when the second form of graham's test (using 
to-url on a string) the url parser doesn't dehex... so the username 
will be invalid.
but I guess the server is responsible for dehexing in that case.
BrianH
2-Sep-2010
[2057x2]
Um, no. The HTTP standard for basic authentication doesn't hex-encode 
the user or password fields. The browser (or in our case, http scheme) 
does.
Only the path is hex-encoded when passed to the server.
Maxim
2-Sep-2010
[2059x2]
ok so then the dehexing should be added in the url-parser and string 
notation used for @ containing passwords.  

just like we use string notation for files containing spaces.
this could a workaround until Carl stops dehexing in the loading 
phase.
BrianH
2-Sep-2010
[2061]
Still haven't traced that.
Maxim
2-Sep-2010
[2062]
>> c: load "http://user%40rebol.com:[blah-:-www-:-rebol-:-com]"
== http://[user-:-rebol-:-com]:[blah-:-www-:-rebol-:-com]
BrianH
2-Sep-2010
[2063]
No, I mean I haven't traced it. LOAD calls other functions, maybe 
even in R2.
Chris
2-Sep-2010
[2064]
The 'solution' is: read [scheme: 'http user: "[user-:-rebol-:-com]" pass: 
.........]

: )
Maxim
2-Sep-2010
[2065]
but that's not a uri  ;-)   the point of the url datatype is that 
we shoudn't need to use "specifications" but uri paths.