Mailing List Archive: Re: REBOL Newbie tries to convert C source to REBOL (long posting)

[REBOL] Re: REBOL Newbie tries to convert C source to REBOL (long posting)

From: AJMartin:orcon at: 24-Dec-2003 22:49


Hi, Mike!

Mike wrote:
> P.S.: Is there any kind of REBOL-specific FORUM available on the web ? If
not: why not ? I would be willing to create one - if there's demand.

Not quite a forum, but a chat program, AltME, is available from:
http://www.altme.com/
And look for the "Rebol" world.

> 1) open a data file (approx. 1 MB data)

> filename: "test.dat"
> data: read filename

> 2) parse the file for sections - every section is indicated by square
brackets and is valid to the next (new) section

> like this: (example data)
--------------------------------------------
[sec1]
dataline1 111.11 N012.11.029 E034.31.110
dataline2 131.11 N012.11.099 E034.31.110
dataline3 111.11 N015.11.099 E034.31.110

[sec2]
datalinex HFD 111.11 N012.11.099 E034.31.114
dataliney LKA 131.41 N011.11.049 E031.31.116
datalinez JIH 111.11 N012.11.019 E032.31.114
--------------------------------------------

I looked at the data above and noticed that it's not directly 'load-able
with Rebol as these values "N012.11.029" will get turned into Rebol words.
So the next plan is to use 'parse. The basic application of 'parse, where
whitespace is important is:

        parse/all data rules

Now to work out the rules that are needed. I can see that there are several
sections in the line which seem to be terminated by newlines. So:

rules: [
    some section_rule
    end
    ]

I can see that each section starts with a open square bracket, then there's
a section name (which seems important) followed by a closing square bracket,
(perhaps optional whitespace?) and a newline. After that, there's any number
of data lines, which seem to form a table of values of various types, with
perhaps a trailing empty line?

section_rule: [
    #"[" copy Section_Name to #"]" skip any #" " newline
    ; The 'skip steps over the "unconsumed" "]".
    any [
        data_line_rule
        ]
    newline
    ]

data_line_rule: [
    any value_item_rule any #" " newline
    ]

> I need to convert the data for each section into a different format.

value_item_rule: [
    copy Item to #" " some #" "
    ]

The above rule pick out each item between space characters. So you just need
to parse each item on each line in each section and convert to appropriate
Rebol data-type. My %Patterns.r script file could be useful at this point.
I'd use it something like this:

value_item_rule: [
    copy Item to #" " some #" " (
        if Item [
            Item: if parse/all Item [
                time^ end (Item: to time! Item)
                | money^ end (Item: to money! Item)
                | integer^ end (Item: to integer! Item)
                ; and so on for the expected data types.
                ] [
                Item
                ]
            ]
        insert tail Data_Line Item
        )
    ]

> 6) Final conversion of the original data will be to CSV-format - including
most data that was read, but occasionally not all data is needed or an
abbreviated form is sufficient.

This part is a bit trickier; some of my scripts in my %Values.r can be very
helpful here. I'd be putting each section into it's own block, with each
line as separate blocks within the surrounding block. That way it can be
more easily torn apart, twisted around and put back together. (I do a lot of
this at the school I work at.)

Here's the first block, just loaded straight into Rebol:
>> sec1: [
[    [dataline1 111.11 N012.11.029 E034.31.110]
[    [dataline2 131.11 N012.11.099 E034.31.110]
[    [dataline3 111.11 N015.11.099 E034.31.110]
[    ]
== [
    [dataline1 111.11 N012.11.029 E034.31.110]
    [dataline2 131.11 N012.11.099 E034.31.110]
    [dataline3 111.11 N015.11....

Then using my 'transpose to give the array a 90 degree twist:

>> probe sec1_columns: transpose sec1
[[dataline1 dataline2 dataline3] [111.11 131.11 111.11] [N012.11.029
N012.11.099 N015.11.099] [E034.31.110 E034.3
1.110 E034.31.110]]
== [[dataline1 dataline2 dataline3] [111.11 131.11 111.11] [N012.11.029
N012.11.099 N015.11.099] [E034.31.110 E03
4.31.110 E034.31.1...

Then swapping columns arround, so the last column is first and the first
column is last:
>> probe t: transpose reduce [last sec1_columns second sec1_columns third
sec1_columns first sec1_columns]
[[E034.31.110 111.11 N012.11.029 dataline1] [E034.31.110 131.11 N012.11.099
dataline2] [E034.31.110 111.11 N015.1
1.099 dataline3]]
== [[E034.31.110 111.11 N012.11.029 dataline1] [E034.31.110 131.11
N012.11.099 dataline2] [E034.31.110 111.11 N01
5.11.099 dataline3...

Then converting to CSV format (using my %CSV.r script):

>> probe mold-csv t
{E034.31.110,111.11,N012.11.029,dataline1
E034.31.110,131.11,N012.11.099,dataline2
E034.31.110,111.11,N015.11.099,dataline3
}
== {E034.31.110,111.11,N012.11.029,dataline1
E034.31.110,131.11,N012.11.099,dataline2
E034.31.110,111.11,N015.11.099,dataline3
}

I hope that helps!

Andrew J Martin
Speaking in tongues and performing miracles.
ICQ: 26227169
http://www.rebol.it/Valley/
http://valley.orcon.net.nz/
http://Valley.150m.com/