Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: Not-too-smart Table Parser (was: how to handle tables?)

From: joel:neely:fedex at: 23-Sep-2001 23:13

Hi, Gregg, Gregg Irwin wrote:
> Hi Joel, > > OK, it's a quick hack and fatally flawed in at least one respect, but > please take a look at it and let me know what you think. >
...
> Emp# First Name Last Name Nickname Pager Nr Phone Number > ==== ===== ==== ==== ====== ======== ===== == ===== > 12 Johannes Doe Jake 888-1001 555-1212 > 3456 Ferdinando Quattlebaum Ferdy 800-555-1214 > 234 Betty Sue Doaks 555-1213 > 4567 Sue Ellen Van Der Lin 888-1002 888-555-1215 > > Assume no leading space or trim all leading space > > Iterate over the first row > if you hit a space > drop down through that column > if you hit a non-space > not a column delimiter > if you get to the bottom, and it's all spaces > it's *probably* a column delimiter > mark column > > You could do the same kind of thing for proportional fonts > using pixel offsets in place of character offsets. >
I'd like to play with the code after getting some sleep; it's been a long weekend! ;-) However, one possible gotcha I can think of (from the verbal version of the algorithm...) Consider the following modified sample data: Emp# First Name Last Name Nickname Pager Nr Phone Number ==== ===== ==== ==== ====== ======== ===== == ===== ====== 12 John Doe Jake 888-1001 555-1212 3456 Phil Quattlebaum Ferdy 800-555-1214 234 Betty Sue Doaks 555-1213 4567 Billy Bob Van Der Lin 888-1002 888-555-1215 In this case, "First" and "Name" would be taken as the headings of two distinct columns. I know this example looks a bit artificial, but consider the possibility of a column whose content has some consistent pattern involving whitespace, such as 1" Nozzle Copper 3.49 1" Pipe PVC 8.33 1" Supply Line Copper 5.77 or 2359 N Abernathy St Louis MO 33333 1498 N Abernathy St Louis MO 33334 1100 N Abernathy St Louis MO 33334 1215 S Abernathy St Louis MO 33301 1492 W Columbus St Louis MO 33324 In my previous experiments, the likelihood of a space representing a column break was also influenced by whether it participated in a horizontal run of whitespace. E.g. the space before "St" or MO or "333.." was more likely to be a column break than the space before that, etc. -jn- -- ; Joel Neely [joel--neely--fedex--com] 901-263-4460 38017/HKA/9677 REBOL [] foreach [order string] sort/skip reduce [ true "!" false head reverse "rekcah" none "REBOL " prin "Just " "another " ] 2 [prin string] print ""