Mailing List Archive: Re: comparing two URLs

[REBOL] Re: comparing two URLs

From: jvargas:whywire at: 24-Oct-2003 13:35


Hallvard,

You can possibly use id: checksum/secure read url and
this id as a unique hash identifier for the page.  With use this id
to index your database, and If two URLs have the exact
same content you will obtain the same checksum and you can
then add the new URL reference to the db without a needing to
update the URL content "page" as it will be already stored in
the case you are storing the pages.  If you had never seen
this id it means you got new content and you proceed to
store the (id, url, content) in the db.

This way of indexing is better than using the url as unique identifier.
I believe this is used by some cache servers like squid.

The chances of having two different pages generating the same
hash id via the checksum algorithm are really low; if I am correct it
rebol uses SHA1 for this.

Hope this helps. Cheers,  Jaime

-- The best way to predict the future is to invent it -- Steve Jobs

On Friday, October 24, 2003, at 02:31  AM, Hallvard Ystad wrote: