Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

Rebol parsing 101

 [1/5] from: mpcweber:comcast at: 1-Oct-2003 21:20


im brand new to Rebol and not getting the hang of string parsing for example: assume i have string fcontents == {09/29/03 ATM/POS ACTIVITY $28.68 (pending) 09/29/03 ATM/POS ACTIVITY $11.41 09/29/03 ATM/POS ACTIVITY $ 21.71 ... i would like to convert this string into a set of blocks where each block has 4 elements of the types [date string money string] (the 4th element is optional [09/29/03 "ATM/POS ACTIVITY" $28.68 (pending)] [09/29/03 "ATM/POS ACTIVITY" $11.41 ] [09/29/03 "ATM/POS ACTIVITY" $21.71 ] i've been plodding around trying to get anything to work, like find/any fcontents ["09/*/03"] but im not understanding how to isolate the elements i suspect i should be able to create something akin to a regular expresion that i can use as a pattern to apply to the string

 [2/5] from: rebol:techscribe at: 1-Oct-2003 22:47


Hi Mike. Which values - besides (pending) - are possible for the fourth, optional, item? Elan Mike Weber wrote:

 [3/5] from: maarten:vrijheid at: 2-Oct-2003 8:01


OK, I'll give you a simple solution for a subset of your problem, from there you can probably expand easily. Ff: parse fcontents " " Foreach [date s1 s2 money status] ff [ ;first we'll have to convert the date month: copy/part date 2 day: copy/part at date 4 2 year: copy/part at date 7 2 new-date: to-date rejoin [ day "-" month "-" year ] ; Now we can construct a block probe reduce [ new-date rejoin [s1 " " s2] to-money money status ] ] However.... this assumes that the optional string (pending) is always there. Using a bnf rule with parse/all will get you further. This will take some experimenting based on the exact format. Another strategy could be to first replace all dates with valid date formats (for REBOL that is, swapping day and month), then converting your date to a block and then parse based on REBOL types, which makes matching much easier. You may want to take a look at the parse section of the user guide. --Maarten

 [4/5] from: brett:codeconscious at: 2-Oct-2003 16:58


> for example: assume i have string > fcontents > == {09/29/03 ATM/POS ACTIVITY $28.68 (pending) 09/29/03 ATM/POS ACTIVITY
$11.41 09/29/03 ATM/POS ACTIVITY $
> 21.71 ...
Before looking at REBOL you need to identify what the structure of your string is. From my point of view I can see that you have multiple transactions, but I'm not sure how you identify each transaction. For example do they all have "ATM/POS ACTIVITY" as a constant or will that change? Do they all start with a US format date? Will the amounts always begin with a $ sign and will they have commas for thousands? Is the (pending) significant for your application or not? Knowing the answer to these questions and other like them gives you a starting point for deciding how to use Parse. You could use its string breakapart mode, or its block mode. You need to decide whether you want it to handle whitespace or not (/all refinement). Then you can create some rules. For example, *this is not complete*, but gives an idea I think: atm-input: [some trx] atm-trx: [ copy trx-date short-us-date atm-constant copy trx-amt dollar-amount opt [pending?] ] atm-constant: ["ATM/POS" "ACTIVITY"] short-us-date: [2 digit #"/" 2 digit #"/" 2 digit] dollar-amount: [#"$" to #" "] parse input atm-input
> but im not understanding how to isolate the elements > > i suspect i should be able to create something akin to a regular expresion
that i can use as a pattern to apply to the string Some info here http://www.codeconscious.com/rebol/parse-tutorial.html If these ideas don't help, just post another email. Cheers, Brett.

 [5/5] from: ingo::2b1::de at: 2-Oct-2003 10:56


Hi Mike, Mike Weber wrote:
> im brand new to Rebol and not getting the hang of string parsing > for example: assume i have string
<<quoted lines omitted: 5>>
> [09/29/03 "ATM/POS ACTIVITY" $11.41 ] > [09/29/03 "ATM/POS ACTIVITY" $21.71 ]
the following works with your example string, but it may choke on embedded newlines (at the >>any " "<<). ;----------- start ------------ s: {09/29/03 ATM/POS ACTIVITY $28.68 (pending) 09/29/03 ATM/POS ACTIVITY $11.41 09/29/03 ATM/POS ACTIVITY $2.11} ; create a charset, matching non numbers non-number: complement charset "0123456789" ; create an empty block for the resutl b: copy [] ; parse/all so that parse doesn't eat spaces ... parse/all s [ ; we want the folloowing more than once some [ ; get the date (for some reason, rebol does not understand this ; dateformat, so we have to get the individual date parts seperatly copy dm 2 skip "/" copy dd 2 skip "/" copy dy 2 skip skip ; the string ends at the start of the money copy s1 to "$" ; money either ends with a space, or it may be the last element in the ; string copy m [thru " " | to end] ; there may some or no space now (maybe newlines? if your string gets ; dynamically created ... any " " ; now append what we found so far to the result block, rebuild the date, ; so that rebol understands it (append/only b compose [(to-date rejoin [dd "/" dm "/" dy] ) (s1) (load m) ]) ; there MAY be a string now, this will start with any character, NO ; number, otherwise you're in trouble here ... anyway, if there's the ; optional string, append it to the last block in your result block ; if there's no string to be found here, we may even have reached the end, ; and be done opt [copy s2 any non-number (if not none? s2 [append last b s2]) | end] ] ] ;------------ end ------------ I hope that gets you going, Ingo

Notes
  • Quoted lines have been omitted from some messages.
    View the message alone to see the lines that have been omitted