Mailing List Archive: Re: problems with url...

[REBOL] Re: problems with url...

From: gscottjones:mchsi at: 18-Jun-2002 14:14


From: "Cyphre"
> I have this problem, how to 'read following url from rebol?
>
> http://slovnik.nettown.cz/?co=naslepo&kde=A-%C8
>
> Anyone ?

Hi, Cyphre,

It is easier to explain how to bypass the problem than to explain where the
real problem lies.
:)

The problem seems to be that the percent sign can be used to escape hex
coded characters.  The dehex-ed character for C8 is �.  When the url is
entered, the interpreter immediately substitues the character "�" for "%C8".
However, the url parser will no longer parse the entire url, because "�" is
not a part of its rules.  Probing the http scheme *after* a failed read
shows that the file portion contains the fragment "?co=naslepo&kde=A-",
indicating to me that it failed at the next character, which *it* thinks is
�
 instead of "%" (followed by "C8", of course).

The way to work around the problem is to do something like the following:

    read rejoin [http://slovnik.nettown.cz/?co=naslepo&kde=A- "%C8"]

which then returns the page.

What I am unsure about is exactly "where" the problem lies?  Is it that some
urls contain hex encoded characters and that REBOL improperly translates the
results in an incorrect manner?  I do not know for sure.  I am not sure why
my work-around works!  Unfortunately, I am out of time to explore the
problem further right now.

Hope this helps a bit anyway.
--Scott Jones