problems with url...(and another "escaped" problem)

[1/3] from: cyphre:seznam:cz at: 19-Jun-2002 9:48

Hi guys, Thanks to all for explaining the problem...I 've thought this should be well known problem but I haven't time for searching the bug. This list is the best place for getting solution very quickly. BTW I hope this bug is already reported to the RTs feedback. Once again thanks, Cyphre PS: Some time ago I posted here "Generic proxy bug" problem but it seems nonone react on this thread, maybe it was overlooked. This should be simmilar problem but in proxy parser I think...so I repost it here: I have noticed that I'm getting different results on web-dictionary CGI query while I have proxy usage on(its "generic" proxy, some version of linux's SQUID) here are the results: with generic proxy usage OFF:

>> read

http://www.translator.cz/bin/translator?trn=uk2cz&gif=0&vcb=watch out URL Parse: none none www.translator.cz none bin/ translator?trn=uk2cz&gif=0&vcb=watch out Net-log: ["Opening" "tcp" "for" "HTTP"] Net-log: {GET http://www.translator.cz/bin/translator?trn=uk2cz&gif=0&vcb=watch out HTTP/1.0 Accept: */* Connection: close User-Agent: REBOL 1.2.5.3.1 Host: www.translator.cz Proxy-Authorization: Basic Og== } Net-log: "HTTP/1.0 200 OK" Net-log: ["low level read of " 2048 "bytes"] ....... Notice the escaped char difference. When proxy usage is on there are no between "watch out". It is possible that this causing different results from the server when the cgi string is passed to the proxy without proper encoding? I'm not sure if there was some thread about this...It is a new bug in proxy parser? Or is the problem in the proxy settings? Anyone?

[2/3] from: gscottjones:mchsi at: 19-Jun-2002 9:34

Hi, Cyphre, From: "Cyphre"

<snip> > PS: Some time ago I posted here "Generic proxy bug" > problem but it seems nonone react on this thread, > maybe it was overlooked. This should be simmilar > problem but in proxy parser I think...so I repost it here:

I think that there are two separate things going on. They are similar in that they both involve the escaped character sequence, but I think the problems occur at different points. %C8 being translated to "�" occurs literally in the REBOL interpreter without further REBOL interaction. Witness (with artificial url for simplification):

>> http://a/%C8

== http://a/� The translation is "immediate." This suggests a big at the interpreter level for url! (and same for file!, but the error is even less likely to surface due to file naming conventions). Other escape characters escape this immediate translation. Witness (with artificial url):

>> http://a/

== http://a/ The space escaped character remains escaped, and not translated. So then why the difference with using proxy versus not using proxy? The answer lies within the http scheme. At one point the "target" is recreated out of components: target: next mold to-file join (join "/" either found? port/path [port/path] [""]) either found? port/target [port/target] [""] Snooping around will show that port/target has already substitued a space for the escape sequence; however, the to-file action retranslates it back to an escape! Later, the scheme wishes to check for the need for using a proxy: http-packet: reform [http-command either generic-proxy? [port/url] [target] http-version] Here, the generic-proxy version utilizes the "spaced" version; whereas, the non-proxy version uses the newly created escaped version (that is a path and target only). A hacked fix would be sure that that the generic-proxy gets a similarly re-escaped version, which could be done several ways, probably most cleverly using Ingo's idea. What I wonder is whether these 2 apparent bugs have been reported to feedback? Hope this clarifies the additional mystery. --Scott Jones

[3/3] from: rotenca::telvia::it at: 19-Jun-2002 23:54

Hi Scott, Hingo and all, I have rethinked to the whole question, and now i have more clear mind (i hope).

> The translation is "immediate." This suggests a big at the interpreter > level for url! (and same for file!, but the error is even less likely to > surface due to file naming conventions).

This is Load.

> Other escape characters escape this immediate translation. Witness (with > artificial url): > > >> http://a/ > == http://a/

1) Url! are like file!. 2) The escape sequence is translated like the others. It does not really exist in the string: x: h:/ ; == h:/ type? x ; == url! form x == "h:/ " last x ; == #" " length? x ;== 4 the loaded url does not contain the char sequence " " but a real space. 3) Load convert ALL the %xx values it finds in the string. 4) It is probe/mold which shows the form %xx, because it knows that x is a url! (or a file!). probe x; == h:/ mold x == "h:/ " length? mold x ;== 6 5) But molding a file or url reconverts to the form %xx only the decimals chars: [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 34 40 41 59 91 93 123 125 127 160 ] An URL, indeed, requires for some section/schemes that all the non alphanum (non a-z A-Z and little more) chars are escaped. Instead, when you mold an url with a "�" char, mold does not show %c8 but "�". 6) In the expression: length? to-url "h:/ " ; == 6 the sequence " " is inserted as-is in the string. 7) Escaping the % with %25 and loading the url has the same effect length? h:/%2520 ; == 6

>The answer lies within the http scheme. At one point the "target" >is recreated out of components: >target: next mold to-file join (join "/" either found? port/path >[port/path] [""]) either found? port/target [port/target] [""] > however, the to-file action retranslates it back to > an escape!

Is the [ mold to-file ] expression that re-creates the "%xx" string, but this works only for the already listed chars. Others chars like %c8 does not "re-appear".

> Later, the scheme wishes to check for the need for using a proxy: > http-packet: reform [http-command either generic-proxy? [port/url] [target]

<<quoted lines omitted: 4>>

> similarly re-escaped version, which could be done several ways, probably > most cleverly using Ingo's idea.

The problem is that we have no mold support for chars like %c8 and also target is wrong for these chars. A solution could be parsing the string and escape the missed chars or escape all the chars (should be safe). Another workaround is to store always the url in a string and call to-url before open/read/write, but this limits the url! datatypes usage. The final solution should be a url! datatypes (and mold) which knows how to handle the escaped chars in the different section with the different schemes, but i think it is very difficult to write.

> Hope this clarifies the additional mystery.

It helped me. Thanks! --- Ciao Romano

Notes

Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted