[REBOL] Re: Not-too-smart Table Parser (was: how to handle tables?)
From: joel:neely:fedex at: 23-Sep-2001 23:13
Hi, Gregg,
Gregg Irwin wrote:
> Hi Joel,
>
> OK, it's a quick hack and fatally flawed in at least one respect, but
> please take a look at it and let me know what you think.
>
...
> Emp# First Name Last Name Nickname Pager Nr Phone Number
> ==== ===== ==== ==== ====== ======== ===== == ===== > 12 Johannes
Doe Jake 888-1001 555-1212
> 3456 Ferdinando Quattlebaum Ferdy 800-555-1214
> 234 Betty Sue Doaks 555-1213
> 4567 Sue Ellen Van Der Lin 888-1002 888-555-1215
>
> Assume no leading space or trim all leading space
>
> Iterate over the first row
> if you hit a space
> drop down through that column
> if you hit a non-space
> not a column delimiter
> if you get to the bottom, and it's all spaces
> it's *probably* a column delimiter
> mark column
>
> You could do the same kind of thing for proportional fonts
> using pixel offsets in place of character offsets.
>
I'd like to play with the code after getting some sleep; it's
been a long weekend! ;-)
However, one possible gotcha I can think of (from the verbal version
of the algorithm...) Consider the following modified sample data:
Emp# First Name Last Name Nickname Pager Nr Phone Number
==== ===== ==== ==== ====== ======== ===== == ===== ======
12 John Doe Jake 888-1001 555-1212
3456 Phil Quattlebaum Ferdy 800-555-1214
234 Betty Sue Doaks 555-1213
4567 Billy Bob Van Der Lin 888-1002 888-555-1215
In this case, "First" and "Name" would be taken as the headings
of two distinct columns.
I know this example looks a bit artificial, but consider the
possibility of a column whose content has some consistent pattern
involving whitespace, such as
1" Nozzle Copper 3.49
1" Pipe PVC 8.33
1" Supply Line Copper 5.77
or
2359 N Abernathy St Louis MO 33333
1498 N Abernathy St Louis MO 33334
1100 N Abernathy St Louis MO 33334
1215 S Abernathy St Louis MO 33301
1492 W Columbus St Louis MO 33324
In my previous experiments, the likelihood of a space representing
a column break was also influenced by whether it participated in
a horizontal run of whitespace. E.g. the space before "St" or
MO
or "333.." was more likely to be a column break than the space
before that, etc.
-jn-
--
; Joel Neely [joel--neely--fedex--com] 901-263-4460 38017/HKA/9677
REBOL [] foreach [order string] sort/skip reduce [ true "!"
false head reverse "rekcah" none "REBOL " prin "Just " "another "
] 2 [prin string] print ""