[REBOL] Re: Need some url purification functions
From: al::bri::xtra::co::nz at: 11-Jan-2001 15:49
> I have made rules that will check whether the url extracted has http:// at
the beginning or not, if not then it appends the domain name that was being
used. while crawling. the code snippet for the same is given below.
> This algo definitely has tons of issues (viz. what happens if the url that
is being crawling is not http://www.yahoo.com but
http://www.yahoo.com/temp.html, will the new relative url ./demo become
http://www.yahoo.com/temp.html/demo, what should happen if the relative url
is ../temp worse ../../../temp and then the same relative urls can be
written as ./demo or demo or /demo etc)
> I wanted to know if I am thinking in the right direction and is there a
simpler way of achieving what I want or do I have to write rules for each
condition. Are there any readymade functions or Rebol code that might give
me a purified absolute url based on certain inputs)
You're roughly on the right track. Use 'load/markup to automatically split
the HTML into tag! and string! datatypes -- this saves a lot of time.
out RFCs, (forgotten the number) there's several on URL, URI and email that
are very helpful. Also, the construct:
is very useful for forming absolute URLs.
I've got a script which handles this all, but it's written under contract.
It's private, not for free use.
I hope that helps!
ICQ: 26227169 http://members.nbci.com/AndrewMartin/