Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search

[REBOL] Re: comparing two URLs

From: hallvard:ystad:helpinhand at: 24-Oct-2003 8:31

Thanks both. But theoretically, a these two URLs may very well not represent the same document: but still reside on the same server (same dns entry). So ... Is it possible to _know_ whether or not these two documents are the same without downloading their documents and comparing them? (I really don't think so myself, but someone might know something I don't.) I suddenly realize this has got very little to do with Rebol. Sorry. Hallvard Dixit Tom Conlin <[tomc--darkwing--uoregon--edu]> (Wed, 22 Oct 2003 10:00:08 -0700 (PDT)):
>On Wed, 22 Oct 2003, Hallvard Ystad wrote: > >> >> Hi list >> >> My rebol stuff search engine now has more than 10000 >> entries, and works pretty fast thanks to DocKimbels >>mysql >> protocol. >> >> Here's a problem: >> Some websites work both with and without the www prefix >> (ex. and just plain and simple >> Sometimes this gives double records in my DB (ex. >> : >>you'll >> see that both and >> appears). >> >> Is there a way to detect such behaviour on a server? Or >>do >> I have to compare my incoming document to whatever >> documents I already have in the DB that _might_ be the >> same document? >> >> Thnaks, >> Hallvard >> >> Pr?tera censeo Carthaginem esse delendam >> -- >> To unsubscribe from this list, just send an email to >> [rebol-request--rebol--com] with unsubscribe as the subject. >> > >Hi Hallvard > >I ran into different reasons for finding more than one >url to a page >(URLs expressed as relative links) >and wrote a QAD function that served my purpose at the >time. > >just added Antons sugestion maybe it will serve > >do > > >canotical-url: func[ url /local t p q][ > replace/all url "\" "/" > t: parse url "/" > while [p: find t ".."][remove remove back p] > while [p: find t "."][remove p] > p: find t "" > while [p <> q: find/last t ""][remove q] > > ;;; this is untested > ;;; using Anton's sugguestion > > if not find t/3 "www."[ > if equal? read join dns:// t/3 read join dns://www. t/3 > [insert t/3 "www."] > ] > > for i 1 (length? t) - 1 1[append t/:i "/"] > to-url url-encode/re rejoin t >] >-- >To unsubscribe from this list, just send an email to >[rebol-request--rebol--com] with unsubscribe as the subject. >
Pr?tera censeo Carthaginem esse delendam