A novice question
[1/2] from: vkmodgil::yahoo::com at: 19-Aug-2000 15:27
I was trying to modify the web parser code from the
User's Guide. The original code is like this:
tag-parser: make object! [
tags: make block! 100
text: make string! 8000
html-code: [
copy tag ["<" thru ">"] (append tags tag) |
copy txt to "<" (append text txt)
]
parse-tags: func [site[url!]] [
clear tags clear text
parse read site [to "<" some html-code]
print text
]
]
My aim is to pick up listings from the web site to
pick up jobs which begin with "keyword_one" and end
with "keyword_two", but I would still like to get rid
of the tags. So I tried this
html-code: [
copy tag ["<" thru "keyword1"] (append tags tag) |
copy txt to "keyword2" (append text txt)
]
etc.. and then use tag-parser/parse-tags modified-url.
But this now hangs.
Any help welcomed by this novice.
-Vik
[2/2] from: bhandley:zip:au at: 20-Aug-2000 11:59
Hi Vik,
Did you retry your program in a fresh session of Rebol? It may have been
that during your writing/testing of your program you got to a point that
triggered the Rebol GC bug (which I understand is being looked at by RT).
Regarding the keywords are they tags or text? This might change the
approach. If say your keywords are part of the text, are immediately before
and after your job posting information, and are unique enough, then you
could just ignore the tags completely and parse based on your keywords.
Something like this maybe:
parse-rules: [
some [
thru keyword-one-text
copy text
to keyword-two-text
(print text)
]
Also, it may not be relevant, but note that the parse function as used in
script examples ignores spaces by default (use parse/all if you want parse
to process spaces).
On a different track, Rebol version 2.3 has the ability to load markup. Like
this,
>> loaded-page: load/markup http://www.abc.net.au/news
loaded-page is now a block that contains values of type tag! and type
string!.
foreach item loaded-page [ if not tag? item [ print item ] ]
or use parse in block mode rules
abc-news-headlines: [
thru <!-- start insert of main story copy -->
some [ thru <b> copy text to </b> (print text)]
<!--end insert of copy for top stories-->
to end
]
>> parse loaded-page abc-news-headlines
Supply ship approaches rescue site as hopes fade
Muslim extremists collapse hostage release talks
Monsoon bus tragedy in central India
US bushfires not letting up
Gore pulls ahead in US presidential poll
Man falls overboard in crocodile-infested waters
Fighting couple force jumbo jet to land
Sport news
This is good if you know exactly what the value of some items in the block
are, but not sood good if you need to do pattern matching. For example
finding the title text is easy because we know a tag <title> exists in the b
lock.
>> copy/part find/tail loaded-page <title> 1
== ["ABC Online News - Latest Bulletin"]
Brett.