Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

REBOL Script to convert REBOL format to RSS XML

 [1/16] from: premshree:pillai:gmai:l at: 31-Dec-2004 4:38


Umm, looks like my last post didn't make it to the list. Anyway, I wrote a small script that generates valid RSS 2.0 feeds from the REBOL data for Carl's blog. The code is available here: http://premshree.seacrow.com/code/rebol/carl-rss.r/download The REBOL data in Carl's blog (see http://www.rebol.net/cgi-bin/blog.r?view=0080) doesn't provide the time of posting, so it just takes it as 00:00:00 GMT, but that's okay, I guess. If there are improvements possible, please point so/do so. :) Thanks. -- Premshree Pillai http://www.livejournal.com/~premshree

 [2/16] from: greggirwin:mindspring at: 30-Dec-2004 17:12


Hi Premshree, PP> Anyway, I wrote a small script that generates valid RSS 2.0 feeds from PP> the REBOL data for Carl's blog. The code is available here: PP> http://premshree.seacrow.com/code/rebol/carl-rss.r/download Very cool! PP> If there are improvements possible, please point so/do so. :) I did a quick re-hack of things, just to show you some different approaches. Others may chime in with more thoughts as well. Some of the things I thought about when evaluating your code: * Use path notation instead of SELECT * Don't use temporary vars if you don't need to * Tags are a valid datatype, you don't need to make them strings * JOIN (and REJOIN) build new strings, meaning lots of allocation and work for the GC. INSERT, APPEND, and their ilk just add on to an existing series; also saves reassigning the value. * See what refinements date! values offer to see if you need to convert them to strings to do what you want. * REPEAT is faster than FOR if you're going from 1 and incrementing by 1. FOREACH makes things even easier most times. I did a couple more advanced things in there (note the tick on the MAKE-ENTRY 'key parameter) and did some others to give you some things to take apart and think about. Holler away if you have questions. BTW, this was a very quick hack, so I don't know if, for example, TO-IDATE gives a valid format as far as RSS is concerned. -- Gregg REBOL [ Title: "RSS Generator for Carl's Blog" Date: 31-Dec-2004 File: %carl-rss.r Home: http://www.livejournal.com/~premshree Author: ["Premshree Pillai" "Gregg Irwin"] Version: 0.0.2 Purpose: { Generates valid RSS 2.0 feeds for Carl's blogs } Comment: { Massive code changes for instructional purposes. --Gregg } ] ;; channel data channel: [ title "Carl's REBOL Blog - Vive la REBOLution" link http://www.rebol.net/ description "describes this blog channel" language "en" ;"English" copyright "2005 Carl Sassenrath" generator "REBOL Messaging Language" ] ;; blog items go here items: [ [ title "Blog item title...." link http://www.rebol.net/cgi-bin/blog.r?view=0080 author "Carl Sassenrath" pubdate 30-Dec-2004/0:0:0 content {the blog goes here} ] [ title "Blog item title 2...." link http://www.rebol.net/cgi-bin/blog.r?view=0081 author "Carl Sassenrath" pubdate 31-Dec-2004/0:0:0 content {the blog 2 goes here} ] ] ;-- No Changes needed below this point ------------------------- make-entry: func [series 'key] [ rejoin [tab to tag! :key series/:key to tag! join #"/" :key] ] channel-entry: func ['key] [make-entry channel :key] channel-entries: func [keys [block!] /local result] [ result: copy "" foreach key keys [append result join channel-entry :key newline] result ] output: copy "" repend output [ <?xml version='1.0' encoding='utf-8' ?> <rss version='2.0'> newline <channel> newline channel-entries [title link description language copyright generator] ] foreach item items [ repend output [ tab <item> newline tab tab <guid isPermaLink='true'> item/link </guid> newline tab tab <pubDate> to-idate item/pubdate </pubDate> newline tab tab <title> item/title </title> newline tab tab <link> item/link </link> newline tab tab <description> item/content </description> newline tab </item> newline ] ] repend output [</channel> newline </rss>] ;print output ;halt write %carl-rss2.xml output

 [3/16] from: carl:cybercraft at: 31-Dec-2004 18:19


On Thursday, 30-December-2004 at 17:12:12 Gregg wrote,
>BTW, this was a very quick hack, so I don't know if, for example, >TO-IDATE gives a valid format as far as RSS is concerned.
There's an RSS validator here... http://feedvalidator.org/ which should tell you when you give it the feed. I think I found TO-IDATE wanting with some Net stuff in the past - may have been RSS. -- Carl Read.

 [4/16] from: volker::nitsch::gmail::com at: 31-Dec-2004 6:54


On Thu, 30 Dec 2004 17:12:12 -0700, Gregg Irwin <[greggirwin--mindspring--com]> wrote:
> Hi Premshree, > > PP> Anyway, I wrote a small script that generates valid RSS 2.0 feeds from > PP> the REBOL data for Carl's blog. The code is available here: > PP> http://premshree.seacrow.com/code/rebol/carl-rss.r/download > > Very cool! > > PP> If there are improvements possible, please point so/do so. :) >
Good scripts :) I add another version which is not that advanced. But keeps the original structure. As intermediate step. Together with some comments (single ";"). REBOL [ Title: "RSS Generator for Carl's Blog" Date: 31-Dec-2004 Version: 0.0.1 File: %carl-rss.r Home: http://www.livejournal.com/~premshree Author: "Premshree Pillai" Version: 0.0.1 Purpose: { Generates valid RSS 2.0 feeds for Carl's blogs } ] ;; channel data channel: [ title "Carl's REBOL Blog - Vive la REBOLution" link http://www.rebol.net/ description "describes this blog channel" language "English" copyright "2005 Carl Sassenrath" generator "REBOL Messaging Language" ] ;; blog items go here items: [ [ title "Blog item title...." link http://www.rebol.net/cgi-bin/blog.r?view=0080 author "Carl Sassenrath" pubdate 30-Dec-2004 content {the blog goes here} ] [ title "Blog item title 2...." link http://www.rebol.net/cgi-bin/blog.r?view=0081 author "Carl Sassenrath" pubdate 31-Dec-2004 content {the blog 2 goes here} ] ] ;; no edits required below this point ; instead of "channel-title" ; we can use "channel/title" ; directly. so we drop this: ; channel-link: select channel 'link ; ... ; instead of ; output: rejoin[ output new-stuff] ; we can use for performance ; append output rejoin[ new-stuff ] ; which is the same as ; repend output [new-stuff] ; (repend is a shortcut because such things occur soo often) ; and then we shrink some more by a function "emit". ; It is usually copypasted and patched everywhere. ; Its not inbuild because each script needs a slighly ; different version. output: copy"" emit: func[block][ repend output block append output newline] ; tags are inbuild and {"} around them are not needed. ; looks sometimes cleaner. ; and i did some linebreaks for email emit [ <?xml version='1.0' encoding='utf-8' ?> <rss version='2.0'><channel> <title> channel/title </title> ] emit [ <link> channel/link </link> <description> channel/description </description> ] emit [ <language> channel/language </language> <copyright> channel/copyright </copyright> ] emit [<generator> channel/generator </generator>] ; instead of "for" and indexing: foreach item items[ title: item/title link: item/link author: item/author ; dropped the date-making, to-idate works similar pubdate: to-idate item/pubdate content: item/content emit [ <item> <guid isPermaLink='true'> link </guid> <pubDate> pubdate </pubDate> ] emit [<title> title </title><link> link </link>] emit [<description> content </description></item>] ] emit[</channel></rss>] write %carl-rss2.xml output

 [5/16] from: greggirwin:mindspring at: 31-Dec-2004 0:01


Thanks Carl, CR> There's an RSS validator here... CR> http://feedvalidator.org/ CR> which should tell you when you give it the feed. I think I found CR> TO-IDATE wanting with some Net stuff in the past - may have been CR> RSS. Looks like it wants time values in there, then it's OK. So, here's another real quick rewrite. Carl wants a link to a reader too. Not my area, but he should know about http://www.rebol.org/cgi-bin/cgiwrap/rebol/view-script.r?script=rss.r in addition to others. Not sure on the Apache question. Just an AddType for rss+xml or something? Sunanda? Premshree? I'll leave my contributions here and let someone take the next step(s) for Carl if they would. -- Gregg REBOL [ Title: "RSS Generator for Carl's Blog" Date: 31-Dec-2004 File: %carl-rss.r Home: http://www.livejournal.com/~premshree Author: ["Premshree Pillai" "Gregg Irwin"] Version: 0.0.3 Purpose: {Generate valid RSS 2.0 feeds for Carl's blogs} Comment: { 0.0.2 Massive code changes for instructional purposes. --Gregg 0.0.3 More changes, knowing Carl actually wants to use it. :) --Gregg } ] make-rss-ctx: context [ make-entry: func [series key] [ rejoin [tab to tag! :key series/:key to tag! join #"/" :key newline] ] channel-entries: func [keys [block!] /local result] [ result: copy "" foreach key keys [append result make-entry channel :key] result ] set 'make-rss func [channel items /local output] [ output: copy "" repend output [ <?xml version='1.0' encoding='utf-8' ?> <rss version='2.0'> newline <channel> newline channel-entries [title link description language copyright generator] newline ] foreach item items [ repend output [ tab <item> newline tab make-entry item 'title tab make-entry item 'link tab make-entry item 'description tab tab <guid isPermaLink='true'> item/link </guid> newline tab tab <pubDate> to-idate item/pubdate </pubDate> newline tab </item> newline newline ] ] repend output [</channel> newline </rss>] ] ] ;; Test Code below ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; comment { ;; channel data channel: [ title "Carl's REBOL Blog - Vive la REBOLution" link http://www.rebol.net/ description "describes this blog channel" language "en" ;"English" copyright "2005 Carl Sassenrath" generator "REBOL" ] ;; blog items go here items: [ [ title "Blog item title...." link http://www.rebol.net/cgi-bin/blog.r?view=0080 description "synopsis of the blog goes here" author "Carl Sassenrath" pubdate 30-Dec-2004/23:24:32-7:00 ] [ title "Blog item title 2...." link http://www.rebol.net/cgi-bin/blog.r?view=0081 description "synopsis of the blog goes here" author "Carl Sassenrath" pubdate 31-Dec-2004/23:24:32-7:00 ] ] print make-rss channel items halt ;write %carl-rss2.xml output }

 [6/16] from: premshree:pillai::gmail at: 31-Dec-2004 14:24


Hi Gregg, This looks nice. Thanks! :) Some rework may be required for the pubDate, though. The to-idate returns something like "Thu, 30 Dec 2004 0:00 +0000". For it to be validated, it must be "Thu, 30 Dec 2004 00:00:00 +0000". The change is required in the time, that is. It's a minor problem, though. If anybody's interested in further hacking, these URLs could be useful: * RSS 2.0 specification: http://blogs.law.harvard.edu/tech/rss * Valid date types for RSS 2.0: http://feedvalidator.org/docs/error/InvalidRFC2822Date.html On Thu, 30 Dec 2004 17:12:12 -0700, Gregg Irwin <[greggirwin--mindspring--com]> wrote:
> Hi Premshree, > PP> Anyway, I wrote a small script that generates valid RSS 2.0 feeds from
<<quoted lines omitted: 95>>
> To unsubscribe from the list, just send an email to rebol-request > at rebol.com with unsubscribe as the subject.
-- Premshree Pillai http://www.livejournal.com/~premshree

 [7/16] from: premshree:pillai:g:mail at: 31-Dec-2004 14:27


On Fri, 31 Dec 2004 14:24:46 +0530, Premshree Pillai <[premshree--pillai--gmail--com]> wrote:
> Hi Gregg, > > This looks nice. Thanks! :) > > Some rework may be required for the pubDate, though. The to-idate > returns something like "Thu, 30 Dec 2004 0:00 +0000". For it to be > validated, it must be "Thu, 30 Dec 2004 00:00:00 +0000". The change is > required in the time, that is. It's a minor problem, though.
Umm, looks like to-idate generates dates of the type "Tue, 9 Mar 2004 1:00:25 -0800" too, which would validate against the feed validator. However, this seems inconsistent(?). Maybe somebody who has a better idea can hack on this.
> If anybody's interested in further hacking, these URLs could be useful: > * RSS 2.0 specification: http://blogs.law.harvard.edu/tech/rss
<<quoted lines omitted: 132>>
> Premshree Pillai > http://www.livejournal.com/~premshree
-- Premshree Pillai http://www.livejournal.com/~premshree

 [8/16] from: premshree::pillai::gmail::com at: 31-Dec-2004 14:56


On Fri, 31 Dec 2004 00:01:54 -0700, Gregg Irwin <[greggirwin--mindspring--com]> wrote:
> Thanks Carl, > CR> There's an RSS validator here...
<<quoted lines omitted: 9>>
> Not sure on the Apache question. Just an AddType for rss+xml or > something? Sunanda? Premshree?
Yes, just an addtype application/rss+xml .rss
> I'll leave my contributions here and let someone take the next step(s) > for Carl if they would.
<<quoted lines omitted: 78>>
> To unsubscribe from the list, just send an email to rebol-request > at rebol.com with unsubscribe as the subject.
-- Premshree Pillai http://www.livejournal.com/~premshree

 [9/16] from: greggirwin::mindspring::com at: 31-Dec-2004 9:51


Hi Premshree, PP> Some rework may be required for the pubDate, though. The to-idate PP> returns something like "Thu, 30 Dec 2004 0:00 +0000". For it to be PP> validated, it must be "Thu, 30 Dec 2004 00:00:00 +0000". The change is PP> required in the time, that is. It's a minor problem, though. TO-IDATE should probably be fixed then, if RFC822 is the spec it's targeting. How about this for a quick patch? to-itime: func [ "Returns a standard internet time string (two digits for each segment)" time [time!] /local form-num ][ form-num: func [num] [either 1 = length? num: form num [join #"0" num] [num]] rejoin [form-num time/hour ":" form-num time/minute ":" form-num round time/second] ] to-idate: func [ "Returns a standard Internet date string." date [date!] /local str ][ str: form date/zone remove find str ":" if (first str) <> #"-" [insert str #"+"] if (length? str) <= 4 [insert next str #"0"] head insert str reform [ pick ["Mon," "Tue," "Wed," "Thu," "Fri," "Sat," "Sun,"] date/weekday date/day pick ["Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"] date/month date/year to-itime date/time "" ] ] -- Gregg

 [10/16] from: SunandaDH:aol at: 31-Dec-2004 4:18


Premshree:
> Umm, looks like to-idate generates dates of the type "Tue, 9 Mar 2004 > 1:00:25 -0800" too, which would validate against the feed validator. > However, this seems inconsistent(?). Maybe somebody who has a better > idea can hack on this.
You could fix or rewrite to--date yourself, if you wanted -- it's a mezzanine: source to-idate It would be good to fix it. I know the dates and times it produces are acceptable to many Internet applications (I've used the output of to-idate many times) But if it clearly doesn't meet the precise spec in RFC 822 -- which asks for 2DIGIT for time and date elements. Sunanda.

 [11/16] from: premshree::pillai::gmail::com at: 1-Jan-2005 0:43


On Fri, 31 Dec 2004 09:51:16 -0700, Gregg Irwin <greggirwin-mindspring.com> wrote:
> Hi Premshree, > PP> Some rework may be required for the pubDate, though. The to-idate
<<quoted lines omitted: 3>>
> TO-IDATE should probably be fixed then, if RFC822 is the spec it's > targeting. How about this for a quick patch?
Looks good! I haven't checked the specs of the RFC-822 completely, but, yes if any of the number is has a single digit, it should be prepended by a "0" (zero). One minor correction needed, though. The date/day should also go through the form-num function. So maybe the form-num function could be made global. I have reproduced Gregg's script along with the minor changes: form-num: func [num] [either 1 = length? num: form num [join #"0" num] [num]] to-itime: func [ "Returns a standard internet time string (two digits for each segment)" time [time!] ][ rejoin [form-num time/hour ":" form-num time/minute ":" form-num time/second] ] to-idate: func [ "Returns a standard Internet date string." date [date!] /local str ][ str: form date/zone remove find str ":" if (first str) <> #"-" [insert str #"+"] if (length? str) <= 4 [insert next str #"0"] head insert str reform [ pick ["Mon," "Tue," "Wed," "Thu," "Fri," "Sat," "Sun,"] date/weekday form-num date/day pick ["Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" Oct "Nov" "Dec"] date/month date/year to-itime date/time "" ] ]
> to-itime: func [ > "Returns a standard internet time string (two digits for each segment)"
<<quoted lines omitted: 24>>
> To unsubscribe from the list, just send an email to rebol-request > at rebol.com with unsubscribe as the subject.
-- Premshree Pillai http://www.livejournal.com/~premshree

 [12/16] from: hallvard:ystad:oops-as:no at: 10-Jan-2005 14:11


Hi I only received the message underneath today (!), and admit that I haven't followed the discussion very closely, so this might already have been answered/solved to satisfaction, but here it is anyway, a patch that I use for idates: unprotect 'to-date to-date: func [value /idate] [ either idate [ replace/all value: find/tail value ", " " " "/" load value ] [ to date! :value ; original to-date ] ] HY Dixit [SunandaDH--aol--com] (Fri, 31 Dec 2004 04:18:08 EST):
>Premshree: >> Umm, looks like to-idate generates dates of the type
<<quoted lines omitted: 20>>
>rebol-request >at rebol.com with unsubscribe as the subject.
Prętera censeo Carthaginem esse delendam Write here:

 [13/16] from: premshree:pillai:gma:il at: 10-Jan-2005 19:20


On Fri, 31 Dec 2004 04:18:08 EST, [SunandaDH--aol--com] <[SunandaDH--aol--com]> wrote:
> Premshree: > > > Umm, looks like to-idate generates dates of the type "Tue, 9 Mar 2004 > > 1:00:25 -0800" too, which would validate against the feed validator. > > However, this seems inconsistent(?). Maybe somebody who has a better > > idea can hack on this. > > You could fix or rewrite to--date yourself, if you wanted -- it's a mezzanine:
Yep, I realised that. IAC, Gregg had submitted a solution. See http://www.rebol.org/cgi-bin/cgiwrap/rebol/ml-display-message.r?m=rmlYYDC Whole thread: http://www.rebol.org/cgi-bin/cgiwrap/rebol/ml-display-thread.r?m=rmlPFDC
> source to-idate > > It would be good to fix it. > > I know the dates and times it produces are acceptable to many Internet > applications (I've used the output of to-idate many times)
The values returned didn't validate for RSS feeds.
> But if it clearly doesn't meet the precise spec in RFC 822 -- which asks for > 2DIGIT for time and date elements. > > Sunanda. > -- > To unsubscribe from the list, just send an email to rebol-request > at rebol.com with unsubscribe as the subject. >
-- Premshree Pillai http://www.livejournal.com/~premshree

 [14/16] from: SunandaDH:aol at: 10-Jan-2005 9:15


Premshree:
> Yep, I realised that. IAC, Gregg had submitted a solution.
This is a case of the ML gremlins......My post predated Gregg's solution by may hours day. But it didn't get delivered until the whole subject was ancient history. Makes me look a little slow on the uptake. Glad we have a fully compliant idate format now. Sunanda.

 [15/16] from: hallvard:ystad:oops-as:no at: 10-Jan-2005 15:20


Oops (my email address indeed!), it seems I supplied a script that goes the other way: from idates to date!. Sorry. HY Dixit "Hallvard Ystad" <[hallvard--ystad--oops-as--no]> (Mon, 10 Jan 2005 14:11:41 +0100):
>Hi >I only received the message underneath today (!), and
<<quoted lines omitted: 53>>
>rebol-request >at rebol.com with unsubscribe as the subject.
Prętera censeo Carthaginem esse delendam Write here:

 [16/16] from: premshree:pillai::gmail at: 31-Dec-2004 2:46


Hello, So I wrote a small script that generates valid RSS 2.0 feeds from the REBOL data for Carl's blog. I don't know where else to post the code (because it won't be of use generally), so I'm posting it here: === BEGIN REBOL CODE === REBOL [ Title: "RSS Generator for Carl's Blog" Date: 31-Dec-2004 Version: 0.0.1 File: %carl-rss.r Home: http://www.livejournal.com/~premshree Author: "Premshree Pillai" Version: 0.0.1 Purpose: { Generates valid RSS 2.0 feeds for Carl's blogs } ] ;; channel data channel: [ title "Carl's REBOL Blog - Vive la REBOLution" link http://www.rebol.net/ description "describes this blog channel" language "English" copyright "2005 Carl Sassenrath" generator "REBOL Messaging Language" ] ;; blog items go here items: [ [ title "Blog item title...." link http://www.rebol.net/cgi-bin/blog.r?view=0080 author "Carl Sassenrath" pubdate 30-Dec-2004 content {the blog goes here} ] [ title "Blog item title 2...." link http://www.rebol.net/cgi-bin/blog.r?view=0081 author "Carl Sassenrath" pubdate 31-Dec-2004 content {the blog 2 goes here} ] ] ;; no edits required below this point channel-title: select channel 'title channel-link: select channel 'link channel-description: select channel 'description channel-language: "en" channel-copyright: select channel 'copyright channel-generator: select channel 'generator output: rejoin ["<?xml version='1.0' encoding='utf-8' ?><rss version='2.0'><channel><title>" channel-title "</title>"] output: rejoin [output "<link>" channel-link "</link>" "<description>" channel-description "</description>"] output: rejoin [output "<language>" channel-language "</language>" <copyright> channel-copyright "</copyright>"] output: rejoin [output "<generator>" channel-generator "</generator>"] for count 1 length? items 1 [ title: select items/:count 'title link: select items/:count 'link author: select items/:count 'author pubdate: parse to-string select items/:count 'pubdate "-" pubdate: rejoin ["Mon, " pubdate/1 " " pubdate/2 " " pubdate/3 " 00:00:00 GMT"] content: select items/:count 'content output: rejoin [output "<item><guid isPermaLink='true'>" link </guid><pubDate> pubdate "</pubDate>"] output: rejoin [output "<title>" title "</title><link>" link "</link>"] output: rejoin [output "<description>" content "</description></item>"] ] output: rejoin[output "</channel></rss>"] write %carl-rss2.xml output === END REBOL CODE === The REBOL data in Carl's blog (see http://www.rebol.net/cgi-bin/blog.r?view=0080) doesn't provide the time of posting, so it just takes it as 00:00:00 GMT, but that's okay, I guess. If there are improvements possible, please point so/do so. :) Thanks. -- Premshree Pillai http://www.livejournal.com/~premshree

Notes
  • Quoted lines have been omitted from some messages.
    View the message alone to see the lines that have been omitted