REBOL Script to convert REBOL format to RSS XML
[1/16] from: premshree:pillai:g:mail at: 31-Dec-2004 4:38
Umm, looks like my last post didn't make it to the list.
Anyway, I wrote a small script that generates valid RSS 2.0 feeds from
the REBOL data for Carl's blog. The code is available here:
http://premshree.seacrow.com/code/rebol/carl-rss.r/download
The REBOL data in Carl's blog (see
http://www.rebol.net/cgi-bin/blog.r?view=0080) doesn't provide the
time of posting, so it just takes it as 00:00:00 GMT, but that's okay,
I guess.
If there are improvements possible, please point so/do so. :)
Thanks.
--
Premshree Pillai
http://www.livejournal.com/~premshree
[2/16] from: greggirwin:mindspring at: 30-Dec-2004 17:12
Hi Premshree,
PP> Anyway, I wrote a small script that generates valid RSS 2.0 feeds from
PP> the REBOL data for Carl's blog. The code is available here:
PP> http://premshree.seacrow.com/code/rebol/carl-rss.r/download
Very cool!
PP> If there are improvements possible, please point so/do so. :)
I did a quick re-hack of things, just to show you some different
approaches. Others may chime in with more thoughts as well. Some of
the things I thought about when evaluating your code:
* Use path notation instead of SELECT
* Don't use temporary vars if you don't need to
* Tags are a valid datatype, you don't need to make them strings
* JOIN (and REJOIN) build new strings, meaning lots of allocation
and work for the GC. INSERT, APPEND, and their ilk just add on
to an existing series; also saves reassigning the value.
* See what refinements date! values offer to see if you need to
convert them to strings to do what you want.
* REPEAT is faster than FOR if you're going from 1 and
incrementing by 1. FOREACH makes things even easier most times.
I did a couple more advanced things in there (note the tick on the
MAKE-ENTRY 'key parameter) and did some others to give you some things
to take apart and think about. Holler away if you have questions.
BTW, this was a very quick hack, so I don't know if, for example,
TO-IDATE gives a valid format as far as RSS is concerned.
-- Gregg
REBOL [
Title: "RSS Generator for Carl's Blog"
Date: 31-Dec-2004
File: %carl-rss.r
Home: http://www.livejournal.com/~premshree
Author: ["Premshree Pillai" "Gregg Irwin"]
Version: 0.0.2
Purpose: {
Generates valid RSS 2.0 feeds for Carl's blogs
}
Comment: {
Massive code changes for instructional purposes. --Gregg
}
]
;; channel data
channel: [
title "Carl's REBOL Blog - Vive la REBOLution"
link http://www.rebol.net/
description "describes this blog channel"
language "en" ;"English"
copyright "2005 Carl Sassenrath"
generator "REBOL Messaging Language"
]
;; blog items go here
items: [
[
title "Blog item title...."
link http://www.rebol.net/cgi-bin/blog.r?view=0080
author "Carl Sassenrath"
pubdate 30-Dec-2004/0:0:0
content {the blog goes here}
]
[
title "Blog item title 2...."
link http://www.rebol.net/cgi-bin/blog.r?view=0081
author "Carl Sassenrath"
pubdate 31-Dec-2004/0:0:0
content {the blog 2 goes here}
]
]
;-- No Changes needed below this point -------------------------
make-entry: func [series 'key] [
rejoin [tab to tag! :key series/:key to tag! join #"/" :key]
]
channel-entry: func ['key] [make-entry channel :key]
channel-entries: func [keys [block!] /local result] [
result: copy ""
foreach key keys [append result join channel-entry :key newline]
result
]
output: copy ""
repend output [
<?xml version='1.0' encoding='utf-8' ?> <rss version='2.0'> newline
<channel> newline
channel-entries [title link description language copyright generator]
]
foreach item items [
repend output [
tab <item> newline
tab tab <guid isPermaLink='true'> item/link </guid> newline
tab tab <pubDate> to-idate item/pubdate </pubDate> newline
tab tab <title> item/title </title> newline
tab tab <link> item/link </link> newline
tab tab <description> item/content </description> newline
tab </item> newline
]
]
repend output [</channel> newline </rss>]
;print output
;halt
write %carl-rss2.xml output
[3/16] from: carl:cybercraft at: 31-Dec-2004 18:19
On Thursday, 30-December-2004 at 17:12:12 Gregg wrote,
>BTW, this was a very quick hack, so I don't know if, for example,
>TO-IDATE gives a valid format as far as RSS is concerned.
There's an RSS validator here...
http://feedvalidator.org/
which should tell you when you give it the feed. I think I found TO-IDATE wanting with
some Net stuff in the past - may have been RSS.
-- Carl Read.
[4/16] from: volker::nitsch::gmail::com at: 31-Dec-2004 6:54
On Thu, 30 Dec 2004 17:12:12 -0700, Gregg Irwin
<[greggirwin--mindspring--com]> wrote:
> Hi Premshree,
>
> PP> Anyway, I wrote a small script that generates valid RSS 2.0 feeds from
> PP> the REBOL data for Carl's blog. The code is available here:
> PP> http://premshree.seacrow.com/code/rebol/carl-rss.r/download
>
> Very cool!
>
> PP> If there are improvements possible, please point so/do so. :)
>
Good scripts :)
I add another version which is not that advanced.
But keeps the original structure.
As intermediate step.
Together with some comments (single ";").
REBOL [
Title: "RSS Generator for Carl's Blog"
Date: 31-Dec-2004
Version: 0.0.1
File: %carl-rss.r
Home: http://www.livejournal.com/~premshree
Author: "Premshree Pillai"
Version: 0.0.1
Purpose: {
Generates valid RSS 2.0 feeds for Carl's blogs
}
]
;; channel data
channel: [
title "Carl's REBOL Blog - Vive la REBOLution"
link http://www.rebol.net/
description "describes this blog channel"
language "English"
copyright "2005 Carl Sassenrath"
generator "REBOL Messaging Language"
]
;; blog items go here
items: [
[
title "Blog item title...."
link http://www.rebol.net/cgi-bin/blog.r?view=0080
author "Carl Sassenrath"
pubdate 30-Dec-2004
content {the blog goes here}
]
[
title "Blog item title 2...."
link http://www.rebol.net/cgi-bin/blog.r?view=0081
author "Carl Sassenrath"
pubdate 31-Dec-2004
content {the blog 2 goes here}
]
]
;; no edits required below this point
; instead of "channel-title"
; we can use "channel/title"
; directly. so we drop this:
; channel-link: select channel 'link
; ...
; instead of
; output: rejoin[ output new-stuff]
; we can use for performance
; append output rejoin[ new-stuff ]
; which is the same as
; repend output [new-stuff]
; (repend is a shortcut because such things occur soo often)
; and then we shrink some more by a function "emit".
; It is usually copypasted and patched everywhere.
; Its not inbuild because each script needs a slighly
; different version.
output: copy""
emit: func[block][ repend output block append output newline]
; tags are inbuild and {"} around them are not needed.
; looks sometimes cleaner.
; and i did some linebreaks for email
emit [
<?xml version='1.0' encoding='utf-8' ?>
<rss version='2.0'><channel>
<title> channel/title </title>
]
emit [
<link> channel/link </link>
<description> channel/description </description>
]
emit [
<language> channel/language </language>
<copyright> channel/copyright </copyright>
]
emit [<generator> channel/generator </generator>]
; instead of "for" and indexing:
foreach item items[
title: item/title
link: item/link
author: item/author
; dropped the date-making, to-idate works similar
pubdate: to-idate item/pubdate
content: item/content
emit [
<item>
<guid isPermaLink='true'> link </guid>
<pubDate> pubdate </pubDate>
]
emit [<title> title </title><link> link </link>]
emit [<description> content </description></item>]
]
emit[</channel></rss>]
write %carl-rss2.xml output
[5/16] from: greggirwin:mindspring at: 31-Dec-2004 0:01
Thanks Carl,
CR> There's an RSS validator here...
CR> http://feedvalidator.org/
CR> which should tell you when you give it the feed. I think I found
CR> TO-IDATE wanting with some Net stuff in the past - may have been
CR> RSS.
Looks like it wants time values in there, then it's OK. So, here's
another real quick rewrite.
Carl wants a link to a reader too. Not my area, but he should know about
http://www.rebol.org/cgi-bin/cgiwrap/rebol/view-script.r?script=rss.r
in addition to others.
Not sure on the Apache question. Just an AddType for rss+xml or
something? Sunanda? Premshree?
I'll leave my contributions here and let someone take the next step(s)
for Carl if they would.
-- Gregg
REBOL [
Title: "RSS Generator for Carl's Blog"
Date: 31-Dec-2004
File: %carl-rss.r
Home: http://www.livejournal.com/~premshree
Author: ["Premshree Pillai" "Gregg Irwin"]
Version: 0.0.3
Purpose: {Generate valid RSS 2.0 feeds for Carl's blogs}
Comment: {
0.0.2 Massive code changes for instructional purposes. --Gregg
0.0.3 More changes, knowing Carl actually wants to use it. :) --Gregg
}
]
make-rss-ctx: context [
make-entry: func [series key] [
rejoin [tab to tag! :key series/:key to tag! join #"/" :key newline]
]
channel-entries: func [keys [block!] /local result] [
result: copy ""
foreach key keys [append result make-entry channel :key]
result
]
set 'make-rss func [channel items /local output] [
output: copy ""
repend output [
<?xml version='1.0' encoding='utf-8' ?> <rss version='2.0'> newline
<channel> newline
channel-entries [title link description language copyright generator]
newline
]
foreach item items [
repend output [
tab <item> newline
tab make-entry item 'title
tab make-entry item 'link
tab make-entry item 'description
tab tab <guid isPermaLink='true'> item/link </guid> newline
tab tab <pubDate> to-idate item/pubdate </pubDate> newline
tab </item> newline newline
]
]
repend output [</channel> newline </rss>]
]
]
;; Test Code below ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
comment {
;; channel data
channel: [
title "Carl's REBOL Blog - Vive la REBOLution"
link http://www.rebol.net/
description "describes this blog channel"
language "en" ;"English"
copyright "2005 Carl Sassenrath"
generator "REBOL"
]
;; blog items go here
items: [
[
title "Blog item title...."
link http://www.rebol.net/cgi-bin/blog.r?view=0080
description "synopsis of the blog goes here"
author "Carl Sassenrath"
pubdate 30-Dec-2004/23:24:32-7:00
]
[
title "Blog item title 2...."
link http://www.rebol.net/cgi-bin/blog.r?view=0081
description "synopsis of the blog goes here"
author "Carl Sassenrath"
pubdate 31-Dec-2004/23:24:32-7:00
]
]
print make-rss channel items
halt
;write %carl-rss2.xml output
}
[6/16] from: premshree:pillai:gm:ail at: 31-Dec-2004 14:24
Hi Gregg,
This looks nice. Thanks! :)
Some rework may be required for the pubDate, though. The to-idate
returns something like "Thu, 30 Dec 2004 0:00 +0000". For it to be
validated, it must be "Thu, 30 Dec 2004 00:00:00 +0000". The change is
required in the time, that is. It's a minor problem, though.
If anybody's interested in further hacking, these URLs could be useful:
* RSS 2.0 specification: http://blogs.law.harvard.edu/tech/rss
* Valid date types for RSS 2.0:
http://feedvalidator.org/docs/error/InvalidRFC2822Date.html
On Thu, 30 Dec 2004 17:12:12 -0700, Gregg Irwin
<[greggirwin--mindspring--com]> wrote:
> Hi Premshree,
> PP> Anyway, I wrote a small script that generates valid RSS 2.0 feeds from
<<quoted lines omitted: 95>>
> To unsubscribe from the list, just send an email to rebol-request
> at rebol.com with unsubscribe as the subject.
--
Premshree Pillai
http://www.livejournal.com/~premshree
[7/16] from: premshree:pillai:gm:ail at: 31-Dec-2004 14:27
On Fri, 31 Dec 2004 14:24:46 +0530, Premshree Pillai
<[premshree--pillai--gmail--com]> wrote:
> Hi Gregg,
>
> This looks nice. Thanks! :)
>
> Some rework may be required for the pubDate, though. The to-idate
> returns something like "Thu, 30 Dec 2004 0:00 +0000". For it to be
> validated, it must be "Thu, 30 Dec 2004 00:00:00 +0000". The change is
> required in the time, that is. It's a minor problem, though.
Umm, looks like to-idate generates dates of the type "Tue, 9 Mar 2004
1:00:25 -0800" too, which would validate against the feed validator.
However, this seems inconsistent(?). Maybe somebody who has a better
idea can hack on this.
> If anybody's interested in further hacking, these URLs could be useful:
> * RSS 2.0 specification: http://blogs.law.harvard.edu/tech/rss
<<quoted lines omitted: 132>>
> Premshree Pillai
> http://www.livejournal.com/~premshree
--
Premshree Pillai
http://www.livejournal.com/~premshree
[8/16] from: premshree::pillai::gmail::com at: 31-Dec-2004 14:56
On Fri, 31 Dec 2004 00:01:54 -0700, Gregg Irwin
<[greggirwin--mindspring--com]> wrote:
> Thanks Carl,
> CR> There's an RSS validator here...
<<quoted lines omitted: 9>>
> Not sure on the Apache question. Just an AddType for rss+xml or
> something? Sunanda? Premshree?
Yes, just an addtype application/rss+xml .rss
> I'll leave my contributions here and let someone take the next step(s)
> for Carl if they would.
<<quoted lines omitted: 78>>
> To unsubscribe from the list, just send an email to rebol-request
> at rebol.com with unsubscribe as the subject.
--
Premshree Pillai
http://www.livejournal.com/~premshree
[9/16] from: greggirwin::mindspring::com at: 31-Dec-2004 9:51
Hi Premshree,
PP> Some rework may be required for the pubDate, though. The to-idate
PP> returns something like "Thu, 30 Dec 2004 0:00 +0000". For it to be
PP> validated, it must be "Thu, 30 Dec 2004 00:00:00 +0000". The change is
PP> required in the time, that is. It's a minor problem, though.
TO-IDATE should probably be fixed then, if RFC822 is the spec it's
targeting. How about this for a quick patch?
to-itime: func [
"Returns a standard internet time string (two digits for each segment)"
time [time!]
/local form-num
][
form-num: func [num] [either 1 = length? num: form num [join #"0" num] [num]]
rejoin [form-num time/hour ":" form-num time/minute ":" form-num round time/second]
]
to-idate: func [
"Returns a standard Internet date string."
date [date!]
/local str
][
str: form date/zone
remove find str ":"
if (first str) <> #"-" [insert str #"+"]
if (length? str) <= 4 [insert next str #"0"]
head insert str reform [
pick ["Mon," "Tue," "Wed," "Thu," "Fri," "Sat," "Sun,"] date/weekday
date/day
pick ["Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"] date/month
date/year to-itime date/time ""
]
]
-- Gregg
[10/16] from: SunandaDH:aol at: 31-Dec-2004 4:18
Premshree:
> Umm, looks like to-idate generates dates of the type "Tue, 9 Mar 2004
> 1:00:25 -0800" too, which would validate against the feed validator.
> However, this seems inconsistent(?). Maybe somebody who has a better
> idea can hack on this.
You could fix or rewrite to--date yourself, if you wanted -- it's a mezzanine:
source to-idate
It would be good to fix it.
I know the dates and times it produces are acceptable to many Internet
applications (I've used the output of to-idate many times)
But if it clearly doesn't meet the precise spec in RFC 822 -- which asks for
2DIGIT for time and date elements.
Sunanda.
[11/16] from: premshree::pillai::gmail::com at: 1-Jan-2005 0:43
On Fri, 31 Dec 2004 09:51:16 -0700, Gregg Irwin
<greggirwin-mindspring.com> wrote:
> Hi Premshree,
> PP> Some rework may be required for the pubDate, though. The to-idate
<<quoted lines omitted: 3>>
> TO-IDATE should probably be fixed then, if RFC822 is the spec it's
> targeting. How about this for a quick patch?
Looks good! I haven't checked the specs of the RFC-822 completely,
but, yes if any of the number is has a single digit, it should be
prepended by a "0" (zero).
One minor correction needed, though. The date/day should also go
through the form-num function. So maybe the form-num function could be
made global.
I have reproduced Gregg's script along with the minor changes:
form-num: func [num] [either 1 = length? num: form num [join #"0" num] [num]]
to-itime: func [
"Returns a standard internet time string (two digits for each segment)"
time [time!]
][
rejoin [form-num time/hour ":" form-num time/minute ":" form-num
time/second]
]
to-idate: func [
"Returns a standard Internet date string."
date [date!]
/local str
][
str: form date/zone
remove find str ":"
if (first str) <> #"-" [insert str #"+"]
if (length? str) <= 4 [insert next str #"0"]
head insert str reform [
pick ["Mon," "Tue," "Wed," "Thu," "Fri," "Sat," "Sun,"] date/weekday
form-num date/day
pick ["Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep"
Oct
"Nov" "Dec"] date/month
date/year to-itime date/time ""
]
]
> to-itime: func [
> "Returns a standard internet time string (two digits for each segment)"
<<quoted lines omitted: 24>>
> To unsubscribe from the list, just send an email to rebol-request
> at rebol.com with unsubscribe as the subject.
--
Premshree Pillai
http://www.livejournal.com/~premshree
[12/16] from: hallvard:ystad:oops-as:no at: 10-Jan-2005 14:11
Hi
I only received the message underneath today (!), and
admit that I haven't followed the discussion very closely,
so this might already have been answered/solved to
satisfaction, but here it is anyway, a patch that I use
for idates:
unprotect 'to-date
to-date: func [value /idate] [
either idate [
replace/all value: find/tail value ", " " " "/"
load value
] [
to date! :value ; original to-date
]
]
HY
Dixit [SunandaDH--aol--com] (Fri, 31 Dec 2004 04:18:08 EST):
>Premshree:
>> Umm, looks like to-idate generates dates of the type
<<quoted lines omitted: 20>>
>rebol-request
>at rebol.com with unsubscribe as the subject.
Prętera censeo Carthaginem esse delendam
Write here:
[13/16] from: premshree:pillai:gma:il at: 10-Jan-2005 19:20
On Fri, 31 Dec 2004 04:18:08 EST, [SunandaDH--aol--com] <[SunandaDH--aol--com]> wrote:
> Premshree:
>
> > Umm, looks like to-idate generates dates of the type "Tue, 9 Mar 2004
> > 1:00:25 -0800" too, which would validate against the feed validator.
> > However, this seems inconsistent(?). Maybe somebody who has a better
> > idea can hack on this.
>
> You could fix or rewrite to--date yourself, if you wanted -- it's a mezzanine:
Yep, I realised that. IAC, Gregg had submitted a solution. See
http://www.rebol.org/cgi-bin/cgiwrap/rebol/ml-display-message.r?m=rmlYYDC
Whole thread: http://www.rebol.org/cgi-bin/cgiwrap/rebol/ml-display-thread.r?m=rmlPFDC
> source to-idate
>
> It would be good to fix it.
>
> I know the dates and times it produces are acceptable to many Internet
> applications (I've used the output of to-idate many times)
The values returned didn't validate for RSS feeds.
> But if it clearly doesn't meet the precise spec in RFC 822 -- which asks for
> 2DIGIT for time and date elements.
>
> Sunanda.
> --
> To unsubscribe from the list, just send an email to rebol-request
> at rebol.com with unsubscribe as the subject.
>
--
Premshree Pillai
http://www.livejournal.com/~premshree
[14/16] from: SunandaDH:aol at: 10-Jan-2005 9:15
Premshree:
> Yep, I realised that. IAC, Gregg had submitted a solution.
This is a case of the ML gremlins......My post predated Gregg's solution by
may hours day.
But it didn't get delivered until the whole subject was ancient history.
Makes me look a little slow on the uptake.
Glad we have a fully compliant idate format now.
Sunanda.
[15/16] from: hallvard:ystad:oops-as:no at: 10-Jan-2005 15:20
Oops (my email address indeed!), it seems I supplied a
script that goes the other way: from idates to date!.
Sorry.
HY
Dixit "Hallvard Ystad" <[hallvard--ystad--oops-as--no]> (Mon,
10 Jan 2005 14:11:41 +0100):
>Hi
>I only received the message underneath today (!), and
<<quoted lines omitted: 53>>
>rebol-request
>at rebol.com with unsubscribe as the subject.
Prętera censeo Carthaginem esse delendam
Write here:
[16/16] from: premshree:pillai:gma:il at: 31-Dec-2004 2:46
Hello,
So I wrote a small script that generates valid RSS 2.0 feeds from the
REBOL data for Carl's blog. I don't know where else to post the code
(because it won't be of use generally), so I'm posting it here:
=== BEGIN REBOL CODE ===
REBOL [
Title: "RSS Generator for Carl's Blog"
Date: 31-Dec-2004
Version: 0.0.1
File: %carl-rss.r
Home: http://www.livejournal.com/~premshree
Author: "Premshree Pillai"
Version: 0.0.1
Purpose: {
Generates valid RSS 2.0 feeds for Carl's blogs
}
]
;; channel data
channel: [
title "Carl's REBOL Blog - Vive la REBOLution"
link http://www.rebol.net/
description "describes this blog channel"
language "English"
copyright "2005 Carl Sassenrath"
generator "REBOL Messaging Language"
]
;; blog items go here
items: [
[
title "Blog item title...."
link http://www.rebol.net/cgi-bin/blog.r?view=0080
author "Carl Sassenrath"
pubdate 30-Dec-2004
content {the blog goes here}
]
[
title "Blog item title 2...."
link http://www.rebol.net/cgi-bin/blog.r?view=0081
author "Carl Sassenrath"
pubdate 31-Dec-2004
content {the blog 2 goes here}
]
]
;; no edits required below this point
channel-title: select channel 'title
channel-link: select channel 'link
channel-description: select channel 'description
channel-language: "en"
channel-copyright: select channel 'copyright
channel-generator: select channel 'generator
output: rejoin ["<?xml version='1.0' encoding='utf-8' ?><rss
version='2.0'><channel><title>" channel-title "</title>"]
output: rejoin [output "<link>" channel-link "</link>" "<description>"
channel-description "</description>"]
output: rejoin [output "<language>" channel-language "</language>"
<copyright>
channel-copyright "</copyright>"]
output: rejoin [output "<generator>" channel-generator "</generator>"]
for count 1 length? items 1 [
title: select items/:count 'title
link: select items/:count 'link
author: select items/:count 'author
pubdate: parse to-string select items/:count 'pubdate "-"
pubdate: rejoin ["Mon, " pubdate/1 " " pubdate/2 " " pubdate/3 " 00:00:00 GMT"]
content: select items/:count 'content
output: rejoin [output "<item><guid isPermaLink='true'>" link
</guid><pubDate>
pubdate "</pubDate>"]
output: rejoin [output "<title>" title "</title><link>" link "</link>"]
output: rejoin [output "<description>" content "</description></item>"]
]
output: rejoin[output "</channel></rss>"]
write %carl-rss2.xml output
=== END REBOL CODE ===
The REBOL data in Carl's blog (see
http://www.rebol.net/cgi-bin/blog.r?view=0080) doesn't provide the
time of posting, so it just takes it as 00:00:00 GMT, but that's okay,
I guess.
If there are improvements possible, please point so/do so. :)
Thanks.
--
Premshree Pillai
http://www.livejournal.com/~premshree
Notes
- Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted