Rebol parsing 101
[1/5] from: mpcweber:comcast at: 1-Oct-2003 21:20
im brand new to Rebol and not getting the hang of string parsing
for example: assume i have string
fcontents
== {09/29/03 ATM/POS ACTIVITY $28.68 (pending) 09/29/03 ATM/POS ACTIVITY $11.41 09/29/03
ATM/POS ACTIVITY $
21.71 ...
i would like to convert this string into a set of blocks where each block has 4 elements
of the types [date string money string] (the 4th element is optional
[09/29/03 "ATM/POS ACTIVITY" $28.68 (pending)]
[09/29/03 "ATM/POS ACTIVITY" $11.41 ]
[09/29/03 "ATM/POS ACTIVITY" $21.71 ]
i've been plodding around trying to get anything to work, like
find/any fcontents ["09/*/03"]
but im not understanding how to isolate the elements
i suspect i should be able to create something akin to a regular expresion that i can
use as a pattern to apply to the string
[2/5] from: rebol:techscribe at: 1-Oct-2003 22:47
Hi Mike.
Which values - besides (pending) - are possible for the fourth,
optional, item?
Elan
Mike Weber wrote:
[3/5] from: maarten:vrijheid at: 2-Oct-2003 8:01
OK, I'll give you a simple solution for a subset of your problem, from
there you can probably expand easily.
Ff: parse fcontents " "
Foreach [date s1 s2 money status] ff [
;first we'll have to convert the date
month: copy/part date 2
day: copy/part at date 4 2
year: copy/part at date 7 2
new-date: to-date rejoin [ day "-" month "-" year ]
; Now we can construct a block
probe reduce [ new-date rejoin [s1 " " s2] to-money money status ]
]
However.... this assumes that the optional string (pending) is always
there.
Using a bnf rule with parse/all will get you further. This will take
some experimenting based on the exact format.
Another strategy could be to first replace all dates with valid date
formats (for REBOL that is, swapping day and month), then converting
your date to a block and then parse based on REBOL types, which makes
matching much easier.
You may want to take a look at the parse section of the user guide.
--Maarten
[4/5] from: brett:codeconscious at: 2-Oct-2003 16:58
> for example: assume i have string
> fcontents
> == {09/29/03 ATM/POS ACTIVITY $28.68 (pending) 09/29/03 ATM/POS ACTIVITY
$11.41 09/29/03 ATM/POS ACTIVITY $
> 21.71 ...
Before looking at REBOL you need to identify what the structure of your
string is. From my point of view I can see that you have multiple
transactions, but I'm not sure how you identify each transaction. For
example do they all have "ATM/POS ACTIVITY" as a constant or will that
change? Do they all start with a US format date? Will the amounts always
begin with a $ sign and will they have commas for thousands? Is the
(pending)
significant for your application or not?
Knowing the answer to these questions and other like them gives you a
starting point for deciding how to use Parse. You could use its string
breakapart mode, or its block mode. You need to decide whether you want it
to handle whitespace or not (/all refinement). Then you can create some
rules.
For example, *this is not complete*, but gives an idea I think:
atm-input: [some trx]
atm-trx: [
copy trx-date short-us-date
atm-constant
copy trx-amt dollar-amount
opt [pending?]
]
atm-constant: ["ATM/POS" "ACTIVITY"]
short-us-date: [2 digit #"/" 2 digit #"/" 2 digit]
dollar-amount: [#"$" to #" "]
parse input atm-input
> but im not understanding how to isolate the elements
>
> i suspect i should be able to create something akin to a regular expresion
that i can use as a pattern to apply to the string
Some info here http://www.codeconscious.com/rebol/parse-tutorial.html
If these ideas don't help, just post another email.
Cheers,
Brett.
[5/5] from: ingo::2b1::de at: 2-Oct-2003 10:56
Hi Mike,
Mike Weber wrote:
> im brand new to Rebol and not getting the hang of string parsing
> for example: assume i have string
<<quoted lines omitted: 5>>
> [09/29/03 "ATM/POS ACTIVITY" $11.41 ]
> [09/29/03 "ATM/POS ACTIVITY" $21.71 ]
the following works with your example string, but it may choke on embedded
newlines (at the >>any " "<<).
;----------- start ------------
s: {09/29/03 ATM/POS ACTIVITY $28.68 (pending) 09/29/03 ATM/POS ACTIVITY
$11.41 09/29/03 ATM/POS ACTIVITY $2.11}
; create a charset, matching non numbers
non-number: complement charset "0123456789"
; create an empty block for the resutl
b: copy []
; parse/all so that parse doesn't eat spaces ...
parse/all s [
; we want the folloowing more than once
some [
; get the date (for some reason, rebol does not understand this
; dateformat, so we have to get the individual date parts seperatly
copy dm 2 skip "/"
copy dd 2 skip "/"
copy dy 2 skip skip
; the string ends at the start of the money
copy s1 to "$"
; money either ends with a space, or it may be the last element in the
; string
copy m [thru " " | to end]
; there may some or no space now (maybe newlines? if your string gets
; dynamically created ...
any " "
; now append what we found so far to the result block, rebuild the date,
; so that rebol understands it
(append/only b compose [(to-date rejoin [dd "/" dm "/" dy] ) (s1) (load
m) ])
; there MAY be a string now, this will start with any character, NO
; number, otherwise you're in trouble here ... anyway, if there's the
; optional string, append it to the last block in your result block
; if there's no string to be found here, we may even have reached the end,
; and be done
opt [copy s2 any non-number (if not none? s2 [append last b s2]) | end]
]
]
;------------ end ------------
I hope that gets you going,
Ingo
Notes
- Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted