Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: comparing two URLs

From: tomc:darkwing:uoregon at: 22-Oct-2003 10:00

On Wed, 22 Oct 2003, Hallvard Ystad wrote:
> Hi list > > My rebol stuff search engine now has more than 10000 > entries, and works pretty fast thanks to DocKimbels mysql > protocol. > > Here's a problem: > Some websites work both with and without the www prefix > (ex. www.rebol.com and just plain and simple rebol.com). > Sometimes this gives double records in my DB (ex. > http://www.oops-as.no/cgi-bin/rebsearch.r?q=mysql : you'll > see that both http://www.softinnov.com/bdd.html and > http://softinnov.com/bdd.html appears). > > Is there a way to detect such behaviour on a server? Or do > I have to compare my incoming document to whatever > documents I already have in the DB that _might_ be the > same document? > > Thnaks, > Hallvard > > Prętera censeo Carthaginem esse delendam > -- > To unsubscribe from this list, just send an email to > [rebol-request--rebol--com] with unsubscribe as the subject. >
Hi Hallvard I ran into different reasons for finding more than one url to a page (URLs expressed as relative links) and wrote a QAD function that served my purpose at the time. just added Antons sugestion maybe it will serve do http://darkwing.uoregon.edu/~tomc/core/web/url-encode.r canotical-url: func[ url /local t p q][ replace/all url "\" "/" t: parse url "/" while [p: find t ".."][remove remove back p] while [p: find t "."][remove p] p: find t "" while [p <> q: find/last t ""][remove q] ;;; this is untested ;;; using Anton's sugguestion if not find t/3 "www."[ if equal? read join dns:// t/3 read join dns://www. t/3 [insert t/3 "www."] ] for i 1 (length? t) - 1 1[append t/:i "/"] to-url url-encode/re rejoin t ]