[REBOL] Not-too-smart Table Parser (was: how to handle tables?)
From: greggirwin::starband::net at: 23-Sep-2001 13:05
Hi Joel,
OK, it's a quick hack and fatally flawed in at least one respect, but
please take a look at it and let me know what you think.
--Gregg
REBOL [
notes: {
Joel Neely:
OTOH, if you really want to be consistent with the philosophy
of letting the human type for human consumption and requiring
the formatting program to figure out what was meant, I'd love
to see some code that can handle the following (exactly as it
appears below, of course). Back in the day, I spent quite a
bit of time trying to come up with a bit of AI that could take
a flat file or printout image such as the following and infer
where the columns were intended, whether each column was to be
left-justfied, right-justified, or centered, and what type of
data should appear in each. It's non-trivial IMHO, but YMMV.
Emp# First Name Last Name Nickname Pager Nr Phone Number
==== ===== ==== ==== ====== ======== ===== == ===== ======
12 Johannes Doe Jake 888-1001 555-1212
3456 Ferdinando Quattlebaum Ferdy 800-555-1214
234 Betty Sue Doaks 555-1213
4567 Sue Ellen Van Der Lin 888-1002 888-555-1215
Assume no leading space or trim all leading space
Iterate over the first row
if you hit a space
drop down through that column
if you hit a non-space
not a column delimiter
if you get to the bottom, and it's all spaces
it's *probably* a column delimiter
mark column
You could do the same kind of thing for proportional fonts
using pixel offsets in place of character offsets.
}
]
; 5 16 28 37 43 46 52
data: [
{Emp# First Name Last Name Nickname Pager Nr Phone Number}
{==== ===== ==== ==== ====== ======== ===== == ===== ======}
{ 12 Johannes Doe Jake 888-1001 555-1212}
{3456 Ferdinando Quattlebaum Ferdy 800-555-1214}
{ 234 Betty Sue Doaks 555-1213}
{4567 Sue Ellen Van Der Lin 888-1002 888-555-1215}
]
mid: func [s start len][return copy/part at s start len]
longest?: func [items /local item result] [
result: 0
foreach item items [
result: max result length? item
]
return result
]
find-columns: func [tbl-data /local i j ch ch2 found-col? result tmp-cols] [
result: make block! 10
tmp-cols: make block! length? tbl-data/1
for i 1 length? tbl-data/1 1 [
ch: pick tbl-data/1 i
if any [(ch = #" ") (ch = #"^-")] [append tmp-cols i]
]
foreach i tmp-cols [
found-col?: true
foreach row next tbl-data [
ch: pick row i
if all [(ch <> #" ") (ch <> #"^-")] [
found-col?: false
break
]
]
if found-col? [
append result i
]
]
return result
]
build-col: func [tbl-data start end /local row result] [
result: make block! length? tbl-data
foreach row tbl-data [
append result trim mid row start (end - start + 1)
]
return result
]
split-to-cols: func [tbl-data /local col-offsets result] [
col-offsets: find-columns tbl-data
insert head col-offsets 0
append col-offsets longest? tbl-data
result: make block! (length? col-offsets) - 1
for i 1 ((length? col-offsets) - 1) 1 [
append/only result build-col tbl-data ((pick col-offsets i) + 1)
pick col-offsets (i + 1)
]
return result
]
build-row: func [row-data col-offsets /local start end result] [
result: make block! length? col-offsets
for i 1 ((length? col-offsets) - 1) 1 [
start: (pick col-offsets i) + 1
end: pick col-offsets (i + 1)
append result trim mid row-data start (end - start + 1)
]
return result
]
split-to-rows: func [tbl-data /local col-offsets result] [
col-offsets: find-columns tbl-data
insert head col-offsets 0
append col-offsets longest? tbl-data
result: make block! (length? col-offsets) - 1
foreach row tbl-data [
append/only result build-row to-string row col-offsets
]
return result
]
;print ["Column Offsets:" find-columns data]
;print ["Longest Item:" longest? data]
;print mold build-col data 1 5
;print mold build-col data 6 16
;print mold split-to-cols data
;print mold build-row to-string data/1 find-columns data
;print mold split-to-rows data
foreach row split-to-rows data [print mold row]
halt