[REBOL] Re: Enhancement - valid [scheme]? words
From: jeff:rebol at: 14-Feb-2001 14:06
> For instance if the last character of a URL before
> whitespace in an email is a "?" then this is most likely a
> real question mark, not part of the URL, regardless of
> whether a "?" at the end of a URL is valid. Same thing for
> ",", "." etc.
Hey, go check out this web site: http://www.example.com/? It's
great. Also, this one is neat, too: http://www.example.com/,
> That kind of determination is heuristic in nature though,
> and cannot be derived from URL grammar rules. The exact set
> of heuristics would depend on the context, e.g. the language
> of the surrounding text. This can actually get very
> complicated. Look at the two examples
I don't know that a heuristic approach would ever be really
adequate, but you really need a full natural language
grammar to make solid distinctions.
> "Have a look at http://www.example.com/? for a great time."
> "Have you seen http://www.example.com/? Looks cool."
> In the first case the "?" appears to be part of the URL, in
> the second it does not.
You can detect the differences in the above two sentences
because looking at the first sentance, a decent natural
language grammar won't allow the the second PP as a complete
sentence (but will recognize "have" as a main verb and thus
complete the VP with the PP), where as with the second
sentence, the grammar will recognize "Have" as an auxiliary
for "seen" and make a match (using a gap and fill scheme,
for example) based on the fact that this is a wh-question
equivalent for its declarative form (You have seen
http://www.example.com.) and therefore it will correctly
determine http://www.example.com is the end of the sentence
and the question mark is the sentence terminator.
Which is to say, as you said, that it can get quite
complicated, but it is also to say that heuristics may
not be sufficient for a lot of cases. :-)