Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: URL handling

From: holger:rebol at: 21-Sep-2001 9:43

On Fri, Sep 21, 2001 at 04:10:57PM +0200, Hallvard Ystad wrote:
> I'm dealing a bit with a URL that causes some trouble. Look at this: > >> print read > http://krak.dk/scripts/firmaresultat.asp?pub_id=KVWW&navn=&vej=&HUSN > R=&POSTNR_FRA=&BY=&omraade=&tlf=&soegeord=&soeginfo=S%F8g
The problem is that escaping in URLs using the % character is used in two ways, first to allow special REBOL characters to be included in URLs, e.g. the ";" character which introduces comments. The other use of % characters is to escape characters in the actual URL for protocol transfer, e.g. control characters or international characters which, according to the specs, are not allowed in URLs. Unfortunately both methods collide. REBOL generally resolves %-escaping when parsing the URL from the input (during 'load), to allow special REBOL characters in URLs. As a result in your example the internal representation does not contain the escaped version of the character any more, but the literal character, which causes the error later in the HTTP protocol handler which verifies the URL for correctness. There are several different workarounds. One is to use spec blocks (make port! [host: "..." path: "..." ...]) instead of URLs. Another workaround is to use 'to-url with strings, as you did. That way REBOL never needs to parse a URL from the input (it only parses a string and then converts the result to an URL), so the %-escaping remains intact. Another workaround is to "double-escape", i.e. to escape the % character as well, as in http://host/path/...soeginfo=S%5EF8g. Here the %5E represents an escaped % character and is resolved during parsing, and the resulting %F8 is then sent to the server. -- Holger Kruse [holger--rebol--com]