Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

Enhancement - valid [scheme]? words

 [1/8] from: g::santilli::tiscalinet::it at: 12-Feb-2001 21:20


Hello Andrew! On 12-Feb-01, you wrote: AM> Unfortunately, this doesn't work for email addresses. :-( You can write a VALID-EMAIL? function quite easily I think... Regards, Gabriele. -- Gabriele Santilli <[giesse--writeme--com]> - Amigan - REBOL programmer Amiga Group Italia sez. L'Aquila -- http://www.amyresource.it/AGI/

 [2/8] from: al:bri:xtra at: 12-Feb-2001 15:48


What would be really nice for Rebol would be valid [scheme]? words. For example: pop? http? finger? whois? which return true for wellformed schemes. So that: http? http://www.pearl.com? would return false, while: http? http://www.rebol.com would return true. All these functions could reside in the scheme objects, one word and it's function per scheme. Unfortunately, this doesn't work for email addresses. :-( Andrew Martin ICQ: 26227169 http://members.nbci.com/AndrewMartin/

 [3/8] from: ptretter:norcom2000 at: 11-Feb-2001 21:28


I agree. That is a great idea and a good enhancement and very useful. Paul Tretter ----- Original Message ----- From: "Andrew Martin" <[Al--Bri--xtra--co--nz]> To: <[rebol-list--rebol--com]> Cc: <[feedback--rebol--com]>

 [4/8] from: holger:rebol at: 14-Feb-2001 10:36


On Mon, Feb 12, 2001 at 03:48:07PM +1300, Andrew Martin wrote:
> which return true for wellformed schemes. So that: > http? http://www.pearl.com? > would return false, while: > http? http://www.rebol.com > would return true.
I don't think that would work the way you want. Consider http? http://www.pearl.com/? Is that valid or not ? Probably yes. Browsers certainly accept it, and some sites (e.g. www.cnn.com) actually use the "?" at the end for internal links in some situations. If you want to be able to highlight URLs in, e.g., emails, then what you really want is not a wellformedness test, but rather a set of heuristics that help you determine the likely end of a URL within a text stream. Typically the end of the URL is marked by whitespace, but if your URL is embedded into ordinary English language text then you may want to account for the most common sentence delimiters used in the English language as well, and strip them off the end of any recognized URL. For instance if the last character of a URL before whitespace in an email is a ? then this is most likely a real question mark, not part of the URL, regardless of whether a "?" at the end of a URL is valid. Same thing for ",", "." etc. That kind of determination is heuristic in nature though, and cannot be derived from URL grammar rules. The exact set of heuristics would depend on the context, e.g. the language of the surrounding text. This can actually get very complicated. Look at the two examples "Have a look at http://www.example.com/? for a great time." "Have you seen http://www.example.com/? Looks cool." In the first case the "?" appears to be part of the URL, in the second it does not. -- Holger Kruse [holger--rebol--com]

 [5/8] from: jeff:rebol at: 14-Feb-2001 14:06


Howdy, Holger:
> For instance if the last character of a URL before > whitespace in an email is a "?" then this is most likely a > real question mark, not part of the URL, regardless of > whether a "?" at the end of a URL is valid. Same thing for > ",", "." etc.
Hey, go check out this web site: http://www.example.com/? It's great. Also, this one is neat, too: http://www.example.com/,
> That kind of determination is heuristic in nature though, > and cannot be derived from URL grammar rules. The exact set > of heuristics would depend on the context, e.g. the language > of the surrounding text. This can actually get very > complicated. Look at the two examples
I don't know that a heuristic approach would ever be really adequate, but you really need a full natural language grammar to make solid distinctions.
> "Have a look at http://www.example.com/? for a great time." > > "Have you seen http://www.example.com/? Looks cool." > > In the first case the "?" appears to be part of the URL, in > the second it does not.
You can detect the differences in the above two sentences because looking at the first sentance, a decent natural language grammar won't allow the the second PP as a complete sentence (but will recognize "have" as a main verb and thus complete the VP with the PP), where as with the second sentence, the grammar will recognize "Have" as an auxiliary for "seen" and make a match (using a gap and fill scheme, for example) based on the fact that this is a wh-question equivalent for its declarative form (You have seen http://www.example.com.) and therefore it will correctly determine http://www.example.com is the end of the sentence and the question mark is the sentence terminator. Which is to say, as you said, that it can get quite complicated, but it is also to say that heuristics may not be sufficient for a lot of cases. :-) -jeff

 [6/8] from: al:bri:xtra at: 15-Feb-2001 16:55


Hi, Jeff and Holger, Thanks for your comments. I'd say that a valid URL containing a "?" would have: ... some AlphaDigit "?" some Graphic ... in the rules. While a URL with a "?" following it would have a space after it, because that's required by normal sentence structure. Any one see if that can be broke? Andrew Martin ICQ: 26227169 http://members.nbci.com/AndrewMartin/? :-)

 [7/8] from: allenk:powerup:au at: 15-Feb-2001 20:13


----- Original Message ----- From: "Andrew Martin" <[Al--Bri--xtra--co--nz]> To: <[rebol-list--rebol--com]> Sent: Thursday, February 15, 2001 1:55 PM Subject: [REBOL] Re: Enhancement - valid [scheme]? words
> Hi, Jeff and Holger, > Thanks for your comments.
<<quoted lines omitted: 3>>
> after it, because that's required by normal sentence structure. > Any one see if that can be broke?
Hi Andrew, www.somewhere.com/? and www.somewhere.com/! With IIS5.0, some sites deliberately use illegals to force a 404 error ASP page, which can then redirect based on the the failed URL. How would you find these URLs compared to normal language structure? Admittedly not a lot of IIS and ASP programmer know about this trick and its derivatives and very few will use the above format, but it is part of new hack to get higher hits on search engines for asp pages and also for response tracking/voting by giving each person a link to a resource that doesn't exist) . Cheers, Allen K

 [8/8] from: al:bri:xtra at: 16-Feb-2001 15:11


Allen K wrote:
> Andrew wrote: > > Any one see if that can be broke?
<<quoted lines omitted: 3>>
> www.somewhere.com/! > With IIS5.0, some sites deliberately use illegals to force a 404 error ASP
page, which can then redirect based on the the failed URL. How would you find these URLs compared to normal language structure? If these links were sent out in a email message, my MS Outlook Express client fails to differentiate them. Both are marked as: www.somewhere.com/ with the "?" and "!" separate from the hyperlink marking.
> Admittedly not a lot of IIS and ASP programmer know about this trick and
its derivatives and very few will use the above format, but it is part of new hack to get higher hits on search engines for asp pages and also for response tracking/voting by giving each person a link to a resource that doesn't exist). So like most hacks, I think this hack is all ready broken? Thanks for the attempt, Allen! Andrew Martin ICQ: 26227169 http://members.nbci.com/AndrewMartin/

Notes
  • Quoted lines have been omitted from some messages.
    View the message alone to see the lines that have been omitted