Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: Enhancement - valid [scheme]? words

From: holger:rebol at: 14-Feb-2001 10:36

On Mon, Feb 12, 2001 at 03:48:07PM +1300, Andrew Martin wrote:
> which return true for wellformed schemes. So that: > http? http://www.pearl.com? > would return false, while: > http? http://www.rebol.com > would return true.
I don't think that would work the way you want. Consider http? http://www.pearl.com/? Is that valid or not ? Probably yes. Browsers certainly accept it, and some sites (e.g. www.cnn.com) actually use the "?" at the end for internal links in some situations. If you want to be able to highlight URLs in, e.g., emails, then what you really want is not a wellformedness test, but rather a set of heuristics that help you determine the likely end of a URL within a text stream. Typically the end of the URL is marked by whitespace, but if your URL is embedded into ordinary English language text then you may want to account for the most common sentence delimiters used in the English language as well, and strip them off the end of any recognized URL. For instance if the last character of a URL before whitespace in an email is a ? then this is most likely a real question mark, not part of the URL, regardless of whether a "?" at the end of a URL is valid. Same thing for ",", "." etc. That kind of determination is heuristic in nature though, and cannot be derived from URL grammar rules. The exact set of heuristics would depend on the context, e.g. the language of the surrounding text. This can actually get very complicated. Look at the two examples "Have a look at http://www.example.com/? for a great time." "Have you seen http://www.example.com/? Looks cool." In the first case the "?" appears to be part of the URL, in the second it does not. -- Holger Kruse [holger--rebol--com]