Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search

url domain

 [1/7] from: patrick:philipot:laposte at: 6-Oct-2003 15:09

Hi List, Parsing an HTML page ( I have found two kinds of link. 1. HREF="recipes/0032.html" 2. IMG SRC="/graphics/doc-bar.gif" The first one refers to the current folder "" . The second one to the current domain "" . Hence my question, is there an easy way to get the domain from an url? Regards Patrick

 [2/7] from: hallvard:ystad:helpinhand at: 6-Oct-2003 15:38

Dixit "patrick" <[patrick--philipot--laposte--net]> (Mon, 6 Oct 2003 15:09:42 +0200):
>Hi List, > [...] >Hence my question, is there an easy way to get the domain >from an url?
How do you mean? From a link like this: "/some/path", you won't get the domain. From a url like this: , you will. You could try this script: and see if it is of any use to you. Regards, HY

 [3/7] from: antonr:iinet:au at: 7-Oct-2003 12:38

These are relative links. So it looks, at least for this webserver, that a relative link that begins with a slash means go to the root first. The resulting absolute links are: 1. 2. What you want is some sort of "clean-url" function, similar to clean-path. This function will know to go to the root directory when it see a leading slash, and try to resolve parent directory ../ markers too. I believe you can't rely on this behaviour on all web servers, though. If you are implementing your own web server, then you will be the one deciding how it works (!), or if you know the behaviour for a particular site like then it's ok. To get the domain from an absolute url, try: third parse url "/" Anton.

 [4/7] from: patrick:philipot:laposte at: 7-Oct-2003 10:07

Thanks Anton finally I will use 'parse to get the domain URL.
>> u: >> parse u [thru "http://" to "/" stop: to end (print copy/part u (index?
stop) - 1 )] == true I thought I read somewhere of some undocumented properties for URL (like /user and /host for email). Regards Patrick

 [5/7] from: g:santilli:tiscalinet:it at: 7-Oct-2003 10:02

Hi Anton, On Tuesday, October 7, 2003, 4:38:35 AM, you wrote: AR> So it looks, at least for this webserver, that AR> a relative link that begins with a slash means go AR> to the root first. Actually, that does not depend on the web server at all. Relative links are resolved client-side by the browser. A slash at the beginning always means "the root". Two slashes means another host, with the same protocol. I.e. a URL is of the form: [[[http:]//somehost]/somepath/]somefile with optional parts in []. You can specify them all, or you can just specify //somehost/somepath/somefile (the protocol is the same as the current page), /somepath/somefile (the protocol and the host are the same as the current page), somefile (protocol, host and path are the same of current page). Regards, Gabriele. -- Gabriele Santilli <[g--santilli--tiscalinet--it]> -- REBOL Programmer Amiga Group Italia sez. L'Aquila --- SOON:

 [6/7] from: patrick::philipot::laposte::net at: 7-Oct-2003 11:52

Thanks gabrielle, Very informative! However in I have found some links that do not match the form: [[[http:]//somehost]/somepath/]somefile for example, <A HREF="recipes/0001.html"> As it seems, a subpath can appear in the link, giving a more general form like [[[http:]//somehost]/somepath/][subpath/]somefile What do you think? Regards Patrick

 [7/7] from: hallvard:ystad:helpinhand at: 8-Oct-2003 8:55

Dixit "patrick" <[patrick--philipot--laposte--net]> (Tue, 7 Oct 2003 10:07:02 +0200):
>Thanks Anton > >finally I will use 'parse to get the domain URL.
Parse is excellent for the job. Now, if I may, I'd like to brag about my script again ( it handles both relative and absolute URLs, and serves you the portion you need. You just 'do the script, and it then works like this: site: url-handler print site/url ; == site/move-to "docs.html" site/move-to site/move-to "#sect1" print site/protocol ; == "http://" print site/host ; == "" print site/path ; == "/docs/core23/rebolcore-1.html" print site/query-part ; == "" print site/section ; == "#sect1" I think I have a newer version than the one on my site right now, but I won't be able to update it before this evening. Thanks for reading, HY