[REBOL] Re: comparing two URLs
From: hallvard:ystad:helpinhand at: 24-Oct-2003 8:31
Thanks both.
But theoretically, a these two URLs may very well not
represent the same document:
http://www.uio.no/
http://uio.no/
but still reside on the same server (same dns entry).
So ... Is it possible to _know_ whether or not these two
documents are the same without downloading their documents
and comparing them? (I really don't think so myself, but
someone might know something I don't.)
I suddenly realize this has got very little to do with
Rebol. Sorry.
Hallvard
Dixit Tom Conlin <[tomc--darkwing--uoregon--edu]> (Wed, 22 Oct
2003 10:00:08 -0700 (PDT)):
>On Wed, 22 Oct 2003, Hallvard Ystad wrote:
>
>>
>> Hi list
>>
>> My rebol stuff search engine now has more than 10000
>> entries, and works pretty fast thanks to DocKimbels
>>mysql
>> protocol.
>>
>> Here's a problem:
>> Some websites work both with and without the www prefix
>> (ex. www.rebol.com and just plain and simple rebol.com).
>> Sometimes this gives double records in my DB (ex.
>> http://www.oops-as.no/cgi-bin/rebsearch.r?q=mysql :
>>you'll
>> see that both http://www.softinnov.com/bdd.html and
>> http://softinnov.com/bdd.html appears).
>>
>> Is there a way to detect such behaviour on a server? Or
>>do
>> I have to compare my incoming document to whatever
>> documents I already have in the DB that _might_ be the
>> same document?
>>
>> Thnaks,
>> Hallvard
>>
>> Pr?tera censeo Carthaginem esse delendam
>> --
>> To unsubscribe from this list, just send an email to
>> [rebol-request--rebol--com] with unsubscribe as the subject.
>>
>
>Hi Hallvard
>
>I ran into different reasons for finding more than one
>url to a page
>(URLs expressed as relative links)
>and wrote a QAD function that served my purpose at the
>time.
>
>just added Antons sugestion maybe it will serve
>
>do
>http://darkwing.uoregon.edu/~tomc/core/web/url-encode.r
>
>canotical-url: func[ url /local t p q][
> replace/all url "\" "/"
> t: parse url "/"
> while [p: find t ".."][remove remove back p]
> while [p: find t "."][remove p]
> p: find t ""
> while [p <> q: find/last t ""][remove q]
>
> ;;; this is untested
> ;;; using Anton's sugguestion
>
> if not find t/3 "www."[
> if equal? read join dns:// t/3 read join dns://www. t/3
> [insert t/3 "www."]
> ]
>
> for i 1 (length? t) - 1 1[append t/:i "/"]
> to-url url-encode/re rejoin t
>]
>--
>To unsubscribe from this list, just send an email to
>[rebol-request--rebol--com] with unsubscribe as the subject.
>
Pr?tera censeo Carthaginem esse delendam