[REBOL] Re: URL handling
From: holger:rebol at: 21-Sep-2001 9:43
On Fri, Sep 21, 2001 at 04:10:57PM +0200, Hallvard Ystad wrote:
> I'm dealing a bit with a URL that causes some trouble. Look at this:
> >> print read
> http://krak.dk/scripts/firmaresultat.asp?pub_id=KVWW&navn=&vej=&HUSN
> R=&POSTNR_FRA=&BY=&omraade=&tlf=&soegeord=&soeginfo=S%F8g
The problem is that escaping in URLs using the % character is used in
two ways, first to allow special REBOL characters to be included in
URLs, e.g. the ";" character which introduces comments.
The other use of % characters is to escape characters in the actual
URL for protocol transfer, e.g. control characters or international
characters which, according to the specs, are not allowed in URLs.
Unfortunately both methods collide. REBOL generally resolves %-escaping
when parsing the URL from the input (during 'load), to allow special
REBOL characters in URLs. As a result in your example the internal
representation does not contain the escaped version of the character
any more, but the literal character, which causes the error later in
the HTTP protocol handler which verifies the URL for correctness.
There are several different workarounds. One is to use spec blocks
(make port! [host: "..." path: "..." ...]) instead of URLs. Another
workaround is to use 'to-url with strings, as you did. That way REBOL
never needs to parse a URL from the input (it only parses a string and
then converts the result to an URL), so the %-escaping remains intact.
Another workaround is to "double-escape", i.e. to escape the % character
as well, as in http://host/path/...soeginfo=S%5EF8g. Here the %5E
represents an escaped % character and is resolved during parsing, and
the resulting %F8 is then sent to the server.
--
Holger Kruse
[holger--rebol--com]