Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

Help with parsing

 [1/3] from: wild:orca:btinternet at: 5-May-2001 22:55


I am a new to Rebol and have been experimenting with the parse function I have tried to find the latest date from a line in a website where the date follows a given piece of text but is followed by either a tag <br> or an unknown piece of text. The line also contain dates to be ignored as they follow a different piece of text. The following function works if the date is always followed by a known piece of text. I have tried various thing to get it to work as I want such as using | to have different text to “copy to” , I have also tried pattern matching using MON1: charset "JFMASOND" MON: charset [#"a" - #"z"] YY: charset [#"0" - #"9"] MON2: charset "aepuco" date: "2-Apr-01" parse date [ 1 YY 0 1 YY "-" 1 MON1 1 MON2 1 MON "-" 1 YY 0 1 YY ] which works on its own, but I can not get it to work with to thru and copy Sample line <PRE>next AR year end 30-Sep-01<br><B>Previous</B> <B>Forecast</B> <br>_____________________________ _____________________________<BR>int xd (8.00p) 1-Jun-99 int results 26-May-01<br><br>fin xd (14.6p) 13-Dec-99 annual report 24-Nov-01<br><br>int results 26-May-00 prelim results 29-Nov-01<br><br>int xd (8.50p) 5-Jun-00 agm 26-Jan-02<br><br>year end 30-Sep-00<br>annual report 24-Nov-00<br>prelim results 29-Nov-00fin xd (15.5p) 27-Dec-00<br>agm 26-Jan-01<br>_____________________________ _____________________________<br></pre> BB: "" Task To find all the dates that follow “prelim results” (in this line two dates do this) compare them and return the newest of the dates Function already written result_dates func [ "Function to find latest date found between the two given values " data before after] [ dates: "" parse data [any [ thru before copy date to after ( dates: join dates [ ( trim date) " "] ) ] to end] maxdate: to-date 01-01-1990 dates-s: parse dates " " foreach result_date dates-s [ either error? try [ to-date result_date ] [] [ maxdate: max (to-date result_date) maxdate ] ] either maxdate = ( to-date 01-01-1990 ) [ maxdate: "no date" ] [ maxdate ] Thanks Stephen

 [2/3] from: brett:codeconscious at: 6-May-2001 11:37


Hi Stephen, The issue you have in the function below is that you are using a future feature of Rebol parse :) In particular your use of "TO pattern" seems reasonable but at the moment the pattern supplied to TO and THRU has to be a single value not a composite parse rule. I believe there was mention that composite patterns will be available in a future release of Rebol. It should have worked with COPY though. I think you also have a small problem with your date format in your parse test because it doesn't appear to handle single digit days (5-jun-00). Otherwise in your sample, the dates all follow a simple rule. Another thing to think about is the context of your dates. Glancing at your sample data it appears dates only relate to the text item found in the same <br>....<br> section. So how about the code below. Some points to note: it finishes the parse early if it finds a "<br>", it trys to match a dates but if it fails it step through once character and trys again. result-dates: function [ "Returns the dates after a specified event name" data event-name ] [parse-rule result date-rule digit mon-abbrev] [ digit: charset [#"0" - #"9"] mon-abbrev: ["jan" | "feb" | "mar" | "apr" | "may" | "jun" | "jul" | aug | "sep" | "oct" | "nov" | "dec"] date-rule: [1 2 digit #"-" mon-abbrev #"-" 2 digit] parse-rule: [ thru event-name any [ "<br>" to end | copy current-date date-rule (append result to-date current-date) | skip ; ignore one character ] end ] result: copy [] either parse data parse-rule [RETURN result] [RETURN none] ] Here are some results based on your example.
>> result-dates s "prelim results"
== [29-Nov-2001]
>> result-dates s "<br>prelim results"
== [29-Nov-2000 27-Dec-2000]
>> result-dates s "int xd"
== [1-Jun-1999 26-May-2001] No doubt these are not quite the right dates so you will need to look at how you find the correct section of data. Hopefully it helps you though. Brett. ----- Original Message ----- From: "Stephen Clarke" <[wild--orca--btinternet--com]> To: <[rebol-list--rebol--com]> Sent: Sunday, May 06, 2001 7:55 AM Subject: [REBOL] Help with parsing I am a new to Rebol and have been experimenting with the parse function I have tried to find the latest date from a line in a website where the date follows a given piece of text but is followed by either a tag <br> or an unknown piece of text. The line also contain dates to be ignored as they follow a different piece of text. The following function works if the date is always followed by a known piece of text. I have tried various thing to get it to work as I want such as using | to have different text to "copy to" , I have also tried pattern matching using MON1: charset "JFMASOND" MON: charset [#"a" - #"z"] YY: charset [#"0" - #"9"] MON2: charset "aepuco" date: "2-Apr-01" parse date [ 1 YY 0 1 YY "-" 1 MON1 1 MON2 1 MON "-" 1 YY 0 1 YY ] which works on its own, but I can not get it to work with to thru and copy Sample line <PRE>next AR year end 30-Sep-01<br><B>Previous</B> <B>Forecast</B> <br>_____________________________ _____________________________<BR>int xd (8.00p) 1-Jun-99 int results 26-May-01<br><br>fin xd (14.6p) 13-Dec-99 annual report 24-Nov-01<br><br>int results 26-May-00 prelim results 29-Nov-01<br><br>int xd (8.50p) 5-Jun-00 agm 26-Jan-02<br><br>year end 30-Sep-00<br>annual report 24-Nov-00<br>prelim results 29-Nov-00fin xd (15.5p) 27-Dec-00<br>agm 26-Jan-01<br>_____________________________ _____________________________<br></pre> BB: "" Task To find all the dates that follow "prelim results" (in this line two dates do this) compare them and return the newest of the dates Function already written result_dates func [ "Function to find latest date found between the two given values " data before after] [ dates: "" parse data [any [ thru before copy date to after ( dates: join dates [ ( trim date) " "] ) ] to end] maxdate: to-date 01-01-1990 dates-s: parse dates " " foreach result_date dates-s [ either error? try [ to-date result_date ] [] [ maxdate: max (to-date result_date) maxdate ] ] either maxdate = ( to-date 01-01-1990 ) [ maxdate: "no date" ] [ maxdate ] Thanks Stephen

 [3/3] from: wild::orca::btinternet::com at: 7-May-2001 19:35


Brett Thanks for taking the time to helping me understand Rebol parsing. I have made some minor modifications to the function that you created for me and now it does exactly what I need it to. Even better I now have a much greater understanding of parsing. The function now looks like this result-dates: function [ "Returns the dates after a specified event name" data event-name ] [parse-rule result date-rule digit mon-abbrev found] [ digit: charset [#"0" - #"9"] mon-abbrev: ["jan" | "feb" | "mar" | "apr" | "may" | "jun" | "jul" | "aug" | "sep" | "oct" | "nov" | "dec"] date-rule: [1 2 digit #"-" mon-abbrev #"-" 2 digit] parse-rule: [ any [ copy current-date date-rule (if found = event-name [append result to-date current-date]) | copy found event-name | skip ; ignore one character ] end ] result: copy [] either parse data parse-rule [RETURN result] [RETURN none] ] The function returns each date in the row that immediately follows the event-name Thanks Stephen