Extracting a date..

[1/10] from: peoyli::algonet::se at: 12-May-2001 13:53

Hi, I have tried to extract a date in a format (currently) not supported by REBOL, without any great success.. The part of the file I want to read from looks like: fdata: { ---------------------------------------------------------------------------- Some other text here May 11 2001 Some other text here ---------------------------------------------------------------------------- } which should be very easy to parse.. (from some other alike answer I found in the list): Setting up a rule for the date format: digit: charset [#"0" - #"9"] mon-abbrev: ["jan" | "feb" | "mar" | "apr" | "may" | "jun" | "jul" | "aug" | "sep" | "oct" | "nov" | "dec"] date-rule: [mon-abbrev some #" " 1 2 digit some #" " 4 digit] then.. the problem begins..

>> date: copy "" parse fdata [to date-rule copy date to "^/" to end] print date

** Script Error: Invalid argument: mon-abbrev some 1 2 digit some 4 digit ** Near: parse fdata [to date-rule copy date to "^/" to end]

>> date: copy "" parse fdata [to "may" copy date to "^/" to end] print date

May 11 2001

>> date: copy "" parse fdata [[to "may" | "jun"] copy date to "^/" to end] print date

May 11 2001

>> date: copy "" parse fdata [[to "apr" | "may" | "jun"] copy date to "^/" to end] print date

(nothing) ... What should I change to make this work ? The first try which returned the script error seems to be the most logical one... /PeO

[2/10] from: gchiu:compkarori at: 13-May-2001 1:19

On 12 May 2001 13:53:39 +0200 "P-O Yliniemi" <[peoyli--algonet--se]> wrote:

> The part of the file I want to read from looks like: > fdata: {

<<quoted lines omitted: 4>>

> ---------------------------------------------------------------------------- > }

This works I think: data: load fdata rule: [ set month [ 'May | 'Jun ] set day integer! set year integer! ( print rejoin [month "/" day "/" year] ) ] parse data [ skip some rule ] -- Graham Chiu

[3/10] from: joel:neely:fedex at: 12-May-2001 9:00

Hi, PeO, Graham has already given you a suggestion for a replacement script. Let me suggest the reason why your efforts below didn't succeed. P-O Yliniemi wrote:

...

> The part of the file I want to read from looks like: > fdata: {

<<quoted lines omitted: 4>>

> ---------------------------------------------------------------------------- > }

...

> Setting up a rule for the date format: > digit: charset [#"0" - #"9"]

<<quoted lines omitted: 5>>

> ** Script Error: Invalid argument: mon-abbrev some 1 2 digit some 4 digit > ** Near: parse fdata [to date-rule copy date to "^/" to end]

Contrary to your (quite understandable) expectation, TO does not like to be applied to a sub-rule. TO wants to be told *exactly* what to look for.

> >> date: copy "" parse fdata [to "may" copy date to "^/" to end] print date > May 11 2001 >

This succeeds because TO now has a literal string for which to search.

> >> date: copy "" parse fdata [[to "may" | "jun"] copy date to "^/" to end] print date > May 11 2001 >

This succeeds, but only accidentally. Think of TO has having higher precedence than | in your first sub-rule. Parse actually matches that sub-rule only if: it can scan forward and find the literal (sub-)string "may" or it can *immediately* (at the present location) find the literal (sub-)string "jun"

> >> date: copy "" parse fdata [[to "apr" | "may" | "jun"] copy date to "^/" to end] print date > (nothing) >

This illustrates the preceding point further, because the initial sub-rule now is looking for "apr" *somewhere* forward from this point (including here) or "may" *right here* or "jun" *right here* The same restrictions apply to THRU, as you can see from the simplified example below, which tries to find the segment of a string that is bounded within either parentheses or angle brackets:

>> test-string: {some words go (inside here) and later}

== "some words go (inside here) and later"

>> open-delim: ["<" | "("]

== ["<" | "("]

>> close-delim: [">" | ")"]

== [">" | ")"]

>> result: ""

== ""

>> big-rule: [to open-delim copy result thru close-delim to end]

== [to open-delim copy result thru close-delim to end] The above BIG-RULE is similar to what you had, in that it tries to apply both TO and THRU to sub-rules which contain alternatives...

>> parse/all test-string big-rule

** Script Error: Invalid argument: < | ( ** Where: halt-view ** Near: parse/all test-string big-rule ... with no better luck than you had. So, let's try explicitly describing the "scan forward for..." alternatives.

>> to-open-delim: [to "<" | to "("]

== [to "<" | to "("]

>> next-rule: [to-open-delim copy result thru close-delim to

end] == [to-open-delim copy result thru close-delim to end]

>> parse/all test-string next-rule

** Script Error: Invalid argument: > | ) ** Where: halt-view ** Near: parse/all test-string next-rule Aha! We got past the TO open delimiter, but then ran into the same kind of limitation with the THRU closing delimiter. So, we fix that.

>> thru-close-delim: [thru ">" | thru ")"]

== [thru ">" | thru ")"]

>> last-rule: [to-open-delim copy result thru-close-delim to

end] == [to-open-delim copy result thru-close-delim to end]

>> parse/all test-string last-rule

== true

>> result

== "(inside here)"

> What should I change to make this work? >

Never use TO or THRU on compound sub-patterns. Either "distribute" the TO or THRU across the options (as done in the dinky example above), or explicitly say "skip forward until you can match this sub-rule" as Graham's rewrite did.

> The first try which returned the script error seems to be the most > logical one... >

Ah, but as we've been discussing lately (under the topic of scope ), things are not always what they "seem to be" in the REBOL universe. ;-) Hope this helps! -jn-

[4/10] from: peoyli:algonet:se at: 12-May-2001 16:49

> On 12 May 2001 13:53:39 +0200 > "P-O Yliniemi" <[peoyli--algonet--se]> wrote:

<<quoted lines omitted: 15>>

> ) ] > parse data [ skip some rule ]

returned 'false' without printing the result... /PeO

[5/10] from: gjones05:mail:orion at: 12-May-2001 10:14

From: "P-O Yliniemi"

> Hi, > > I have tried to extract a date in a format (currently) not supported

by REBOL,

> without any great success.. > > The part of the file I want to read from looks like: > > fdata: { > ----------------------------------------------------------------------

------

> Some other text here > May 11 2001 > Some other text here > ----------------------------------------------------------------------

------

> } > > which should be very easy to parse.. (from some other alike answer I

found in

> the list): > Setting up a rule for the date format:

<<quoted lines omitted: 4>>

> then.. the problem begins.. > >> date: copy "" parse fdata [to date-rule copy date to "^/" to end]

print date

> ** Script Error: Invalid argument: mon-abbrev some 1 2 digit some

4 digit

> ** Near: parse fdata [to date-rule copy date to "^/" to end] > > >> date: copy "" parse fdata [to "may" copy date to "^/" to end] print

date

> May 11 2001 > > >> date: copy "" parse fdata [[to "may" | "jun"] copy date to "^/" to

end] print date

> May 11 2001 > > >> date: copy "" parse fdata [[to "apr" | "may" | "jun"] copy date to

^/ to end] print date

> (nothing) > > ... > > What should I change to make this work ? The first try which returned

the script

> error seems to be the most logical one... > > /PeO

Hi, PeO, Looks simple; surprisingly tough. Here is what I came up with. It allows for dates with single spaces or a comma-space combination. I had trouble working in additional component separators, and I don't understand why. But here it is: fdata: { ------------------------------------------------------------------------ ---- Some other text here May 11 2001 Some other text here ------------------------------------------------------------------------ ---- } mon-abbrev-start: [ to "jan" | to "feb" | to "mar" | to "apr" | to "may" | to "jun" | to "jul" | to "aug" | to "sep" | to "oct" | to "nov" | to "dec" ] skip-spacers: [thru " " | thru ", "] digit: charset [#"0" - #"9"] s: e: none parse/all fdata [mon-abbrev-start s: skip-spacers 1 2 digit skip-spacers 2 4 digit e: to end] copy/part s e --Scott Jones

[6/10] from: ingo:2b1 at: 12-May-2001 18:39

Hi PeO, yes having 'to and 'thru work on subrules would make some things so much easier ... Well, here is my suggestion, with the added benefit, that it works even if one of the month abbrevs is part of the text before the date. [REBOL[] fdata: { ---------------------------------------------------------------------------- Some other text here June and May walked the talk May 11 2001 Some other text here ---------------------------------------------------------------------------- } space: [ thru " " ] digit: charset [#"0" - #"9"] month: [ "jan" | "feb" | "mar" | "apr" | "may" | "jun" | "jul" | "aug" | "sep" | "oct" | "nov" | "dec" ] day: [ 1 2 digit ] year: [ 2 4 digit ] the_rule: [ copy m month space copy d day space copy y year ] parse/all fdata [ any [ the_rule to end | thru " " ]] date: to-date rejoin [y "-" m "-" d] ] kind regards, Ingo

[7/10] from: peoyli:algonet:se at: 12-May-2001 22:35

Hi, Thanks for the solutions to this problem. I finally decided to use and extend Ingo's solution (but I already had a version that works based on the Scott's example) because it didn't accept 'May 1a 20ab' as the date, and because I already had need for a block with the abbreviated month names. Thanks to Joel for the explanation why my initial attempts failed, and to Graham for the first attempt to solve the problem (which should have succeeded with a little more experimenting). Changes: made into a function which returns a numerical date format of yy(yy)-mm-dd, or 'none' if no date in file. --- extract-date: func [ {Extract the first occurrance of a date [mmm-d(d)-yy(yy)] in a string} text [string!] {The string to extract the date from} /local space digit month day year date-rule m d y ][ space: [ thru " " ] digit: charset [#"0" - #"9"] month: [ "jan" | "feb" | "mar" | "apr" | "may" | "jun" | "jul" | "aug" | "sep" | "oct" | "nov" | "dec" ] day: [ 1 2 digit ] year: [ 2 4 digit ] date-rule: [ copy m month space copy d day space copy y year ] either parse/all fdata [ any [ date-rule to end | thru " " ]] [ m: to-string (index? find month m) + 1 / 2 if 2 > length? m [insert m "0"] if 2 > length? d [insert d "0"] rejoin [y "-" m "-" d] ; date: to-date rejoin [y "-" m "-" d] ][ none ] ] --- /PeO

[8/10] from: gjones05:mail:orion at: 12-May-2001 16:10

Hi, again, all, Ingo did create the most robust solution. When you encased it into a function, you appear to have forgotten to change 'fdata to 'text. From: "P-O Yliniemi"

> Hi, > > Thanks for the solutions to this problem. I finally decided to use and > extend Ingo's solution (but I already had a version that works based

> the Scott's example) because it didn't accept 'May 1a 20ab' as the

date,

> and because I already had need for a block with the abbreviated month > names. > > Thanks to Joel for the explanation why my initial attempts failed, and > to Graham for the first attempt to solve the problem (which should

have

> succeeded with a little more experimenting). > > Changes: made into a function which returns a numerical date format of > yy(yy)-mm-dd, or 'none' if no date in file. > > --- > extract-date: func [ > {Extract the first occurrance of a date [mmm-d(d)-yy(yy)] in a

string}

> text [string!] {The string to extract the date from} > /local space digit month day year date-rule m d y > ][ > space: [ thru " " ] > digit: charset [#"0" - #"9"] > month: [ "jan" | "feb" | "mar" | "apr" | "may" | "jun" | "jul" |

aug |

[9/10] from: gchiu:compkarori at: 13-May-2001 10:17

On 12 May 2001 16:49:12 +0200 "P-O Yliniemi" <[peoyli--algonet--se]> wrote:

> > > > returned 'false' without printing the result... >

It was late at night :-). This is tested and works, but of course picks up: May 234242 3523523523 rebol [] fdata: {

> ---------------------------------------------------------------------------- > Some other text here > May 11 2001 > Some other text here > ---------------------------------------------------------------------------- > }

data: load fdata rule: [ some [ set month ['Apr | 'May | 'Jun ] set day integer! set year integer! (print rejoin [ month "/" day "/" year ] ) | skip ] ] parse data rule -- Graham Chiu

[10/10] from: peoyli:algonet:se at: 13-May-2001 0:50

> Hi, again, all, > > Ingo did create the most robust solution. When you encased it into a > function, you appear to have forgotten to change 'fdata to 'text. >

I guess I can borrow Graham's phrase in the mail that arrived after yours: 'It was late at night' :) Actually, the reason for forgetting to change 'fdata, was that I just pasted the old function header onto the new (and the reason that it worked was that I read the file into 'fdata in the test script).. /PeO

Notes

Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted