[REBOL] Re: Help with parsing
From: brett:codeconscious at: 6-May-2001 11:37
Hi Stephen,
The issue you have in the function below is that you are using a future
feature of Rebol parse :)
In particular your use of "TO pattern" seems reasonable but at the moment
the pattern supplied to TO and THRU has
to be a single value not a composite parse rule. I believe there was mention
that composite patterns will be available in
a future release of Rebol. It should have worked with COPY though.
I think you also have a small problem with your date format in your parse
test because it doesn't appear to handle single digit days (5-jun-00).
Otherwise in your sample, the dates all follow a simple rule.
Another thing to think about is the context of your dates. Glancing at your
sample data it appears dates only relate
to the text item found in the same <br>....<br> section.
So how about the code below. Some points to note: it finishes the parse
early if it finds a "<br>", it trys to match a dates but
if it fails it step through once character and trys again.
result-dates: function [
"Returns the dates after a specified event name"
data
event-name
] [parse-rule result date-rule digit mon-abbrev] [
digit: charset [#"0" - #"9"]
mon-abbrev: ["jan" | "feb" | "mar" | "apr" | "may" | "jun" | "jul" |
aug
| "sep" | "oct" | "nov" | "dec"]
date-rule: [1 2 digit #"-" mon-abbrev #"-" 2 digit]
parse-rule: [
thru event-name
any [
"<br>" to end |
copy current-date date-rule (append result to-date
current-date) |
skip ; ignore one character
]
end
]
result: copy []
either parse data parse-rule [RETURN result] [RETURN none]
]
Here are some results based on your example.
>> result-dates s "prelim results"
== [29-Nov-2001]
>> result-dates s "<br>prelim results"
== [29-Nov-2000 27-Dec-2000]
>> result-dates s "int xd"
== [1-Jun-1999 26-May-2001]
No doubt these are not quite the right dates so you will need to look at how
you find the correct section of data.
Hopefully it helps you though.
Brett.
----- Original Message -----
From: "Stephen Clarke" <[wild--orca--btinternet--com]>
To: <[rebol-list--rebol--com]>
Sent: Sunday, May 06, 2001 7:55 AM
Subject: [REBOL] Help with parsing
I am a new to Rebol and have been experimenting with the parse function
I have tried to find the latest date from a line in a website where the date
follows a given piece of text but is followed by either a tag <br> or an
unknown piece of text. The line also contain dates to be ignored as they
follow a different piece of text.
The following function works if the date is always followed by a known piece
of text. I have tried various thing to get it to work as I want such as
using | to have different text to "copy to" , I have also tried pattern
matching using
MON1: charset "JFMASOND"
MON: charset [#"a" - #"z"]
YY: charset [#"0" - #"9"]
MON2: charset "aepuco"
date: "2-Apr-01"
parse date [ 1 YY 0 1 YY "-" 1 MON1 1 MON2 1 MON "-" 1 YY 0 1 YY ]
which works on its own, but I can not get it to work with to thru and copy
Sample line
<PRE>next AR year end 30-Sep-01<br><B>Previous</B>
<B>Forecast</B> <br>_____________________________
_____________________________<BR>int xd (8.00p) 1-Jun-99 int
results 26-May-01<br><br>fin xd (14.6p) 13-Dec-99
annual report 24-Nov-01<br><br>int results 26-May-00
prelim results 29-Nov-01<br><br>int xd (8.50p) 5-Jun-00
agm 26-Jan-02<br><br>year end 30-Sep-00<br>annual
report 24-Nov-00<br>prelim results 29-Nov-00fin xd (15.5p)
27-Dec-00<br>agm 26-Jan-01<br>_____________________________
_____________________________<br></pre>
BB: ""
Task
To find all the dates that follow "prelim results" (in this line two dates
do this) compare them and return the newest of the dates
Function already written
result_dates func [ "Function to find latest date found between the two
given values " data before after] [
dates: ""
parse data [any [ thru before copy date to after ( dates: join
dates [ ( trim date) " "] ) ] to end]
maxdate: to-date 01-01-1990
dates-s: parse dates " "
foreach result_date dates-s [
either error? try [ to-date result_date ] [] [ maxdate:
max (to-date result_date) maxdate ] ]
either maxdate = ( to-date 01-01-1990 ) [ maxdate: "no
date" ] [ maxdate ]
Thanks
Stephen