url domain
[1/7] from: patrick:philipot:laposte at: 6-Oct-2003 15:09
Hi List,
Parsing an HTML page (http://www.rebol.net/cookbook/index.html) I have found two kinds
of link.
1. HREF="recipes/0032.html"
2. IMG SRC="/graphics/doc-bar.gif"
The first one refers to the current folder "http://www.rebol.net/cookbook/" .
The second one to the current domain "http://www.rebol.net/" .
Hence my question, is there an easy way to get the domain from an url?
Regards
Patrick
[2/7] from: hallvard:ystad:helpinhand at: 6-Oct-2003 15:38
Dixit "patrick" <[patrick--philipot--laposte--net]> (Mon, 6 Oct
2003 15:09:42 +0200):
>Hi List,
> [...]
>Hence my question, is there an easy way to get the domain
>from an url?
How do you mean? From a link like this: "/some/path", you
won't get the domain. From a url like this:
http://www.tokerud.gs.oslo.no/index.html
, you will. You
could try this script:
http://folk.uio.no/hallvary/rebol/url-handler.r and see if
it is of any use to you.
Regards,
HY
[3/7] from: antonr:iinet:au at: 7-Oct-2003 12:38
These are relative links.
So it looks, at least for this webserver, that
a relative link that begins with a slash means go
to the root first.
The resulting absolute links are:
1. http://www.rebol.net/cookbook/recipes/0032.html
2. http://www.rebol.net/graphics/doc-bar.gif
What you want is some sort of "clean-url" function,
similar to clean-path.
This function will know to go to the root directory
when it see a leading slash, and try to resolve
parent directory ../ markers too.
I believe you can't rely on this behaviour on all
web servers, though. If you are implementing your
own web server, then you will be the one deciding how it
works (!), or if you know the behaviour for a
particular site like rebol.net then it's ok.
To get the domain from an absolute url, try:
third parse url "/"
Anton.
[4/7] from: patrick:philipot:laposte at: 7-Oct-2003 10:07
Thanks Anton
finally I will use 'parse to get the domain URL.
>> u: http://www.rebol.net/cookbook/index.html
>> parse u [thru "http://" to "/" stop: to end (print copy/part u (index?
stop) - 1 )]
http://www.rebol.net
== true
I thought I read somewhere of some undocumented properties for URL (like
/user and /host for email).
Regards
Patrick
[5/7] from: g:santilli:tiscalinet:it at: 7-Oct-2003 10:02
Hi Anton,
On Tuesday, October 7, 2003, 4:38:35 AM, you wrote:
AR> So it looks, at least for this webserver, that
AR> a relative link that begins with a slash means go
AR> to the root first.
Actually, that does not depend on the web server at all. Relative
links are resolved client-side by the browser. A slash at the
beginning always means "the root". Two slashes means another host,
with the same protocol. I.e. a URL is of the form:
[[[http:]//somehost]/somepath/]somefile
with optional parts in []. You can specify them all, or you can
just specify //somehost/somepath/somefile (the protocol is the
same as the current page), /somepath/somefile (the protocol and
the host are the same as the current page), somefile (protocol,
host and path are the same of current page).
Regards,
Gabriele.
--
Gabriele Santilli <[g--santilli--tiscalinet--it]> -- REBOL Programmer
Amiga Group Italia sez. L'Aquila --- SOON: http://www.rebol.it/
[6/7] from: patrick::philipot::laposte::net at: 7-Oct-2003 11:52
Thanks gabrielle,
Very informative!
However in http://www.rebol.net/cookbook/index.html I have found some links
that do not match the form:
[[[http:]//somehost]/somepath/]somefile
for example, <A HREF="recipes/0001.html">
As it seems, a subpath can appear in the link, giving a more general form
like
[[[http:]//somehost]/somepath/][subpath/]somefile
What do you think?
Regards
Patrick
[7/7] from: hallvard:ystad:helpinhand at: 8-Oct-2003 8:55
Dixit "patrick" <[patrick--philipot--laposte--net]> (Tue, 7 Oct
2003 10:07:02 +0200):
>Thanks Anton
>
>finally I will use 'parse to get the domain URL.
Parse is excellent for the job.
Now, if I may, I'd like to brag about my script again
(http://folk.uio.no/hallvary/rebol/url-handler.r): it
handles both relative and absolute URLs, and serves you
the portion you need. You just 'do the script, and it then
works like this:
site: url-handler http://www.rebol.com
print site/url ; == http://www.rebol.com/
site/move-to "docs.html"
site/move-to
http://www.rebol.com/docs/core23/rebolcore-1.html
site/move-to "#sect1"
print site/protocol ; == "http://"
print site/host ; == "www.rebol.com"
print site/path ; == "/docs/core23/rebolcore-1.html"
print site/query-part ; == ""
print site/section ; == "#sect1"
I think I have a newer version than the one on my site
right now, but I won't be able to update it before this
evening.
Thanks for reading,
HY