[REBOL] Re: The "It's Mine Now and I'll Do What I Want With It" Project Proposal
From: rgombert:essentiel at: 10-Mar-2001 16:20
why not use the <BASE HREF=""> tag ?
If there's one, you just have to change it, and otherwise you add one. Then
you just have to take care of absolutes URL, wich have to be turned in
relatives one, regarding to a specific folder conataining the related things
Renaud
----- Original Message -----
From: "Terry Brownell" <[depotcity--home--com]>
To: "Rebol List" <[rebol-list--rebol--com]>
Sent: Saturday, March 10, 2001 9:22 AM
Subject: [REBOL] The "It's Mine Now and I'll Do What I Want With It" Project
Proposal
> Goal - Reconstruct a previously read webpage prior to saving so that all
tags are complete URLs
> Here's a project some may be interested in collaborating on for the good
of Reboldom.
> The Problem.
>
> When an HTML document is read and then saved, many of the tags (src, a
href etc) become "dead" due to the original page referencing a path to the
local server directly, like so...
> <a href="/news/0-1006-200-5079991.html?tag=tp_pr">
>
> as opposed to the complete URL, thus...
>
> <a href="http://www.news.com/news/0-1006-200-5079991.html?tag=tp_pr">
>
> When the page is then "delivered" outside of its domain, the resulting
html is marred. This hinders webpage manipulation and must not be allowed
to continue.
> The Solution
>
> Now lets say we could replace the "dead" (for lack of a proper definition)
URLs with "well-formed" URLs, what would be some of the advantages?
> A few that come to mind include;
>
> - Reading a webpage, removing the javascript that "breaks" the page out of
frames, then delivering it to a frame (sneaky huh?)
> - Removing/Replacing banner ads.
> - Marking up the page with XML on the fly
> - Annotating the page
> - Highlight key points
> etc.
>
> Now this seems like an easy task, but it's deceiving. One may say, "Just
insert the domain part of the URL into the tags" (see my "been using rebol
for months, but still green" script below) This works for basic sites, but
as the HTML gets more and more complex, so the sophistication of function.
> For example, some of these "dead" tags get pretty wirey... some have a
leading "/" and some don't, some are embedded into javascript, and many
other styles.
> Is this idea too far fetched? Am I not seeing the forest for the trees? Is
there already a solution?
> Your thoughts and input are much appreciated.
>
> Terry Brownell
> www.LFReD.com
>
> Below is the "It's Mine Now 1.0"
> (Note: I know this could be written much better, and as a minimum made
into a function, but it's a start from a starter. Feel free to improve. Also
I find laying the code out into long lines easier to follow and debug. Don't
ask me why, maybe cuz I'm Canadian.)
> rebol []
>
> the-domain: to-url ask "What domain?"
> the-markup: load/markup the-domain
>
> ;The following will check for "dead" SRCs, if true then add the domain
>
> forall the-markup [if all [(type? first the-markup) = tag! found? find
first the-markup {src="} not found? find first the-markup "://"][insert
find/tail first the-markup {src="} the-domain]]
> ;The following will check for "dead" HREFs and replace with domain if
necessary
> the-markup: head the-markup
>
> forall the-markup [if all [found? find first the-markup {HREF="} not
found? find first the-markup "://"][insert find/tail first the-markup
{HREF="} the-domain]]