Enhancement - valid [scheme]? words
[1/8] from: g::santilli::tiscalinet::it at: 12-Feb-2001 21:20
Hello Andrew!
On 12-Feb-01, you wrote:
AM> Unfortunately, this doesn't work for email addresses. :-(
You can write a VALID-EMAIL? function quite easily I think...
Regards,
Gabriele.
--
Gabriele Santilli <[giesse--writeme--com]> - Amigan - REBOL programmer
Amiga Group Italia sez. L'Aquila -- http://www.amyresource.it/AGI/
[2/8] from: al:bri:xtra at: 12-Feb-2001 15:48
What would be really nice for Rebol would be valid [scheme]? words. For
example:
pop?
http?
finger?
whois?
which return true for wellformed schemes. So that:
http? http://www.pearl.com?
would return false, while:
http? http://www.rebol.com
would return true.
All these functions could reside in the scheme objects, one word and it's
function per scheme.
Unfortunately, this doesn't work for email addresses. :-(
Andrew Martin
ICQ: 26227169 http://members.nbci.com/AndrewMartin/
[3/8] from: ptretter:norcom2000 at: 11-Feb-2001 21:28
I agree. That is a great idea and a good enhancement and very useful.
Paul Tretter
----- Original Message -----
From: "Andrew Martin" <[Al--Bri--xtra--co--nz]>
To: <[rebol-list--rebol--com]>
Cc: <[feedback--rebol--com]>
[4/8] from: holger:rebol at: 14-Feb-2001 10:36
On Mon, Feb 12, 2001 at 03:48:07PM +1300, Andrew Martin wrote:
> which return true for wellformed schemes. So that:
> http? http://www.pearl.com?
> would return false, while:
> http? http://www.rebol.com
> would return true.
I don't think that would work the way you want. Consider
http? http://www.pearl.com/?
Is that valid or not ? Probably yes. Browsers certainly accept it, and some
sites (e.g. www.cnn.com) actually use the "?" at the end for internal links
in some situations.
If you want to be able to highlight URLs in, e.g., emails, then what you really
want is not a wellformedness test, but rather a set of heuristics that help you
determine the likely end of a URL within a text stream. Typically the end of the
URL is marked by whitespace, but if your URL is embedded into ordinary English
language text then you may want to account for the most common sentence delimiters
used in the English language as well, and strip them off the end of any recognized URL.
For instance if the last character of a URL before whitespace in an email is a
?
then this is most likely a real question mark, not part of the URL, regardless
of whether a "?" at the end of a URL is valid. Same thing for ",", "." etc.
That kind of determination is heuristic in nature though, and cannot be derived
from URL grammar rules. The exact set of heuristics would depend on the context,
e.g. the language of the surrounding text. This can actually get very complicated.
Look at the two examples
"Have a look at http://www.example.com/? for a great time."
"Have you seen http://www.example.com/? Looks cool."
In the first case the "?" appears to be part of the URL, in the second it does not.
--
Holger Kruse
[holger--rebol--com]
[5/8] from: jeff:rebol at: 14-Feb-2001 14:06
Howdy, Holger:
> For instance if the last character of a URL before
> whitespace in an email is a "?" then this is most likely a
> real question mark, not part of the URL, regardless of
> whether a "?" at the end of a URL is valid. Same thing for
> ",", "." etc.
Hey, go check out this web site: http://www.example.com/? It's
great. Also, this one is neat, too: http://www.example.com/,
> That kind of determination is heuristic in nature though,
> and cannot be derived from URL grammar rules. The exact set
> of heuristics would depend on the context, e.g. the language
> of the surrounding text. This can actually get very
> complicated. Look at the two examples
I don't know that a heuristic approach would ever be really
adequate, but you really need a full natural language
grammar to make solid distinctions.
> "Have a look at http://www.example.com/? for a great time."
>
> "Have you seen http://www.example.com/? Looks cool."
>
> In the first case the "?" appears to be part of the URL, in
> the second it does not.
You can detect the differences in the above two sentences
because looking at the first sentance, a decent natural
language grammar won't allow the the second PP as a complete
sentence (but will recognize "have" as a main verb and thus
complete the VP with the PP), where as with the second
sentence, the grammar will recognize "Have" as an auxiliary
for "seen" and make a match (using a gap and fill scheme,
for example) based on the fact that this is a wh-question
equivalent for its declarative form (You have seen
http://www.example.com.) and therefore it will correctly
determine http://www.example.com is the end of the sentence
and the question mark is the sentence terminator.
Which is to say, as you said, that it can get quite
complicated, but it is also to say that heuristics may
not be sufficient for a lot of cases. :-)
-jeff
[6/8] from: al:bri:xtra at: 15-Feb-2001 16:55
Hi, Jeff and Holger,
Thanks for your comments.
I'd say that a valid URL containing a "?" would have:
... some AlphaDigit "?" some Graphic ...
in the rules. While a URL with a "?" following it would have a space
after it, because that's required by normal sentence structure.
Any one see if that can be broke?
Andrew Martin
ICQ: 26227169 http://members.nbci.com/AndrewMartin/? :-)
[7/8] from: allenk:powerup:au at: 15-Feb-2001 20:13
----- Original Message -----
From: "Andrew Martin" <[Al--Bri--xtra--co--nz]>
To: <[rebol-list--rebol--com]>
Sent: Thursday, February 15, 2001 1:55 PM
Subject: [REBOL] Re: Enhancement - valid [scheme]? words
> Hi, Jeff and Holger,
> Thanks for your comments.
<<quoted lines omitted: 3>>
> after it, because that's required by normal sentence structure.
> Any one see if that can be broke?
Hi Andrew,
www.somewhere.com/?
and
www.somewhere.com/!
With IIS5.0, some sites deliberately use illegals to force a 404 error ASP
page, which can then redirect based on the the failed URL. How would you
find these URLs compared to normal language structure?
Admittedly not a lot of IIS and ASP programmer know about this trick and its
derivatives and very few will use the above format, but it is part of new
hack to get higher hits on search engines for asp pages and also for
response tracking/voting by giving each person a link to a resource that
doesn't exist) .
Cheers,
Allen K
[8/8] from: al:bri:xtra at: 16-Feb-2001 15:11
Allen K wrote:
> Andrew wrote:
> > Any one see if that can be broke?
<<quoted lines omitted: 3>>
> www.somewhere.com/!
> With IIS5.0, some sites deliberately use illegals to force a 404 error ASP
page, which can then redirect based on the the failed URL. How would you
find these URLs compared to normal language structure?
If these links were sent out in a email message, my MS Outlook Express
client fails to differentiate them. Both are marked as:
www.somewhere.com/
with the "?" and "!" separate from the hyperlink marking.
> Admittedly not a lot of IIS and ASP programmer know about this trick and
its derivatives and very few will use the above format, but it is part of
new hack to get higher hits on search engines for asp pages and also for
response tracking/voting by giving each person a link to a resource that
doesn't exist).
So like most hacks, I think this hack is all ready broken? Thanks for the
attempt, Allen!
Andrew Martin
ICQ: 26227169 http://members.nbci.com/AndrewMartin/
Notes
- Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted