[REBOL] Re: URL handling
From: holger::rebol::com at: 21-Sep-2001 14:21
On Fri, Sep 21, 2001 at 09:05:33PM +0200, Hallvard Ystad wrote:
> The specs are about to be changed. I know it's still in some kind of beta state, but
international characters are about to be allowed in URLs. As an example (I take it from
your name that you're danish, Holger),
German, actually.
> Yes, but there's one thing to keep in mind. The following does NOT work:
>
> print read to-url to-string http://krak.dk/scripts/firmaresultat.asp?pub_id=KVWW&navn=&vej=&HUSNR=&POSTNR_FRA=&BY=&omraade=&tlf=&soegeord=&soeginfo=S%F8g
Of course not. It is equivalent to print read http:///.. The to-url
and to-string calls only change the type, not the contents of the URL.
> because rebol identifies the url as an url and interprets it the wrong way
REBOL uses % for escaping special characters in URLs. The URL does
not behave the way you want for the same reason that the string
abc^/def
does not contain the characters ^ and /. In that case you
need to escape the ^ by entering "abc^^/def" to get the expected result.
The same is true for URLs, only it is the % that has to be escaped,
leading to %25F8 instead of %F8.
> before my to-string is evaluated. So if one receives a url through a referencing word,
say 'my-word, then one has to get the string with something like
>
> my-string: rejoin [{"} my-word {"}]
>
> before converting it to a URL.
What you are saying is a little confusing... Is 'my-word of type
url! ? In that case you don't have to convert anything. It contains
what you want. Is it of type string! ? In that case you don't need
the quotes, just use to-url on the string.
You only run into problems if you run a URL that does not contain
the required % escaping through 'load, 'do or any other function
that uses the scanner, e.g. to-string when the argument is a block.
You will encounter the same problem if you execute, say,
to-string ["ab^/de"], and really want the ^ and / characters in the string.
The point to remember is that any time you run a sequence of characters
through the scanner, REBOL will handle escape characters. This means
if you know that the input does not contain the escaping required
by REBOL, but literal, unescaped characters, then only use functions
that do not use the scanner -- or insert the escaping yourself before
calling the scanner. If you need to convert a URL which is embedded
into a larger string and does not contain proper escaping to a url!
type then do not use 'load. Just pass the substring you need to to-url.
That way the URL is not scanned and thus not changed.
This is not a bug. All scanners that allow escaping behave that way.
Only use a scanner if the input complies with the escaping conventions
used by the scanner.
--
Holger Kruse
[holger--rebol--com]