[REBOL] Re: How to extract content of HTML table?
From: anton::wilddsl::net::au at: 29-Aug-2006 14:14
Just a small tip: quick and dirty is better.
Your code will be smaller and easier to maintain.
I made some HTML extractors for weather, train timetable and TV
I found that every ~8 months or so they change the damned
html layout, breaking my code in a hard to predict way.
Parsing the whole html document properly, while interesting,
does not make your code less susceptible to this problem,
because they make changes like:
- nesting the table with the main content inside another table
- breaking the content into separate pages
- adding cells just for layout spacing
- adding markup to text like <b>
- changing the titles of key fields which you are looking for
You need artificial intelligence to reliably handle all that!
So it's really not worth it to start parsing at "<html>..."
You'll just make a huge parse rule which will be difficult to
I use parse, not load/markup, by the way.
I think I tried load/markup and found that it's too "correct", ie.
it can't handle messy html very well. (But it's been a long time,
maybe I don't remember well.)
If you have any troubles with parse, let us know.