Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: building a dynamic path to elements in block

From: null_dev:yah:oo at: 4-Nov-2000 14:21

Joel, Here's the URL for the ancient greek texts http://www.perseus.tufts.edu/ cgi-bin/perscoll?collection=Greco-Roman&type=text&lang=greek - take your pick. If you're fond of Rome go to the texts and translations page and you'll find a lot of latin texts as well. The texts will initially display in transliterated greek with a hypertext link for every word to the Liddell-Scott lexicon. If you go to the Display Configuration Menu you can get it into UTF-8 and drop the morphology links to get something a little easier to handle. If your downloading a few pages you'll probably want to cut and paste the cookie you get back from the config menu. It's a very impressive site - though probably a little too cluttered for my aesthetic - and closer to the universal library some of us were hoping for out of the internet ( until the world wide web turned up and turned it into a zillion gigabytes of shallow advertising :-} ) The main parsing problems I've had have to do with - fragments of none html in the pages, poorly nested tags, and occaisional missing elements. Because html is style based rather than structure based I've had to create some guesses for structure. So far I'm close to parsing correctly about 90% of pages - even automating the construction af a reasonably correct TEI header. But you're right it was a little to ambitious - My codes a mess and I've backed myself into some ugly corners. But I think of it as a draft - get through it messily once and then create something a little more elegant ( probably wishful thinking ). I'll send you an example xml page when I've got something reasonable. Have you any idea the best way to set up a good guess under rebol? I've also started a Gutenburg text to xml set of scripts, and was curious how you would do something like - if find the word "Contents" on a short line followed closely by a series of short lines guess a contents list and tag accordingly. - if find a match between elements in one of these lines and possibly Chapter Header candidates make a link. Some of REBOLS great parsing abilities make me think something like this is possible - but I don't quite know how you would put it together. Thanks Gary