[REBOL] Re: Enhancement - valid [scheme]? words
From: holger:rebol at: 14-Feb-2001 10:36
On Mon, Feb 12, 2001 at 03:48:07PM +1300, Andrew Martin wrote:
> which return true for wellformed schemes. So that:
> http? http://www.pearl.com?
> would return false, while:
> http? http://www.rebol.com
> would return true.
I don't think that would work the way you want. Consider
http? http://www.pearl.com/?
Is that valid or not ? Probably yes. Browsers certainly accept it, and some
sites (e.g. www.cnn.com) actually use the "?" at the end for internal links
in some situations.
If you want to be able to highlight URLs in, e.g., emails, then what you really
want is not a wellformedness test, but rather a set of heuristics that help you
determine the likely end of a URL within a text stream. Typically the end of the
URL is marked by whitespace, but if your URL is embedded into ordinary English
language text then you may want to account for the most common sentence delimiters
used in the English language as well, and strip them off the end of any recognized URL.
For instance if the last character of a URL before whitespace in an email is a
?
then this is most likely a real question mark, not part of the URL, regardless
of whether a "?" at the end of a URL is valid. Same thing for ",", "." etc.
That kind of determination is heuristic in nature though, and cannot be derived
from URL grammar rules. The exact set of heuristics would depend on the context,
e.g. the language of the surrounding text. This can actually get very complicated.
Look at the two examples
"Have a look at http://www.example.com/? for a great time."
"Have you seen http://www.example.com/? Looks cool."
In the first case the "?" appears to be part of the URL, in the second it does not.
--
Holger Kruse
[holger--rebol--com]