Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search

[REBOL] Re: make-doc-pro: how to handle tables?

From: joel:neely:fedex at: 23-Sep-2001 22:53

Hi, Gregg, Gregg Irwin wrote:
> << > Employee ID First Name Last Name Phone Nr > -------- -- ----- ---- ---- ------ ----- -- > 1234 John Doe 555-1212 > 2345 Jane Doaks 555-1213 > 3456 Ferdinando Quattlebaum 555-1214 > 4567 Mary Ellen Van Der Lin 555-1215 > >> > > This is a great example! I would consider this to be a bad > table layout because it *is* confusing. >
But it isn't confusing, is it? Since you're an intelligent human being, you understand enough about the *meaning* of the data in the columns (and about interpreting the rows of the table in the context of the entire table) that you could probably see four columns instantly.
> Why would you build such a tightly spaced table (outside of > being an example here)? >
This was only an illustration. I might need to use tight spacing because there's lots of data to get on each line and I'm trying to stay within narrow text margins. The rest of my answer would be very similar to your next comment.
> If I use tabs, I get a wider spacing than I would like for > this table, but it clearly delineates the columns. >
The more spaces I have to type, the more typing effort is involved. In addition, whether I use spaces or tabs, if I have lots of data to present, I want to waste as little space as possible. So, I agree with your comment that the example below is "a wider spacing than I would like", or than will fit on the page.
> Employee ID First Name Last Name Phone Nr > -------- -- ----- ---- ---- ------ ----- -- > 1234 John Doe 555-1212 > 2345 Jane Doaks 555-1213 > 3456 Ferdinando Quattlebaum 555-1214 > 4567 Mary Ellen Van Der Lin 555-1215 > > << OBTW, tabs are of absolutely no use in solving this problem... > > Right! Yes! Absolutely! So, naturally, you would type *another* > tab to add space before the start of the next column. Type up > a table of text in your favorite word processor. How do you do > it? >
If I were using a word processor (or a desktop publishing program), I would type exactly one tab between successive columns. I would then adjust the positions of the tab stops to provide adequate horizontal separation between the columns. Unfortunately, simple text processors (and most utilities that just dump plain text files to screen or printer) have fixed tab stops (e.g., every 8 characters) which usually means that some tab positions are too wide and others are too narrow (when we're dealing with typical text/data).
> << > > Whitespace *is* visual feedback. > > > Bah! Humbug! Horsefeathers! See the proofreading exercise > above. Counting nonprinting characters to figure out *what* > *kind* of delimiter (between words *in* a column or between > columns) is an awful burden to lay on a poor human (technical > or not!). > >> > > Right again! We're not counting characters at all, we never do. >
Distinguishing one space from multiple spaces requires counting, even if it's only counting from 1 to 2. But this begs the question of what "multiple" means!
> What we're doing is creating "visual groupings" of data in > columns. They must be visually distinguishable for us to make > sense of them. If they are laid out so that they might confuse > a human reader, then I would expect a parser to get confused > as well. Now, a human can take a few extra seconds to try and > make sense of the data, as I did with your first example table. > Can a program do the same thing? Sure. As you mention, this is > no easy task, and beyond what make-doc-pro should be expected to > do, but a program should be able to scan the data and take a > few extra seconds to try and determine so heuristics that make > the most sense of things. Worst case, it could ask. "Hey, are > there 4 columns in this table (as in the example I'm displaying > to you now in my disruptive dialog box)?" >
What we *intend* to do is create visual groupings. What we actually do is type characters. What we must do if we want a program to read our minds and discover our intentions is to make precise statements about the rules of inference for deciding where the "visual groupings" are to be delimited. We certainly agree that writing such telepathic code in the absence of clear specifications "is no easy task, and beyond what make-doc-pro should be expected to do". However, this thread began with the question of how to incorporate tables into make-doc-pro or similar tools. So we seem to be agreeing that writing heuristic code that can make sense of our use of whitespace is beyond the scope of a make-doc-style program.
> << > > Do we need delimiters, beyond whitespace, in our REBOL code? > > > We certainly do. Including [ ] ( ) { } and " . > >> > > Sorry, my meaning wasn't clear. Here's an example of what I meant. > Using only whitespace, we can type this: > > emit: func [data] [append out reduce data] > > Using another delimiter (|), we get this: > > emit:|func|[data]|[append|out|reduce|data] > > That's the simple case I wanted to make. >
But it illustrates the fallacy I was emphasizing. In the above example, the whitespace is ONLY used for a single "level" of logical grouping. All other structure is shown by means of the other delimiters. Using your example, we can agree that emit: func [data] [append out reduce data] and emit: func [ data ][ append out reduce data ] mean exactly the same thing, variations in whitespace being of no significance. As far as I can tell, a table has four levels (not counting the level of individual characters!): 1) individual tokens/words/symbols within a cell 2) a single cell, made up of such tokens/words/symbols 3) a single row, made up of cells 4) the entire table, made up of rows Using varying amounts of whitespace (horizontal or vertical) to distinguish among these levels seems to risk the same possible errors (of typing or reading) as if we used varying whitespace in the code example emit func data append out reduce data where appearing at the left margin makes a word a set-word, a gap of multiple spaces begins or ends a block, and single spaces separate words within a block. I hope none of us would argue in favor of such a "natural, punctuation-free" version of REBOL!
> << > In my book things such as > > \table > /table > > or underlining with hyphens or equal signs are just as much > "syntax" that has to be learned and understood for effective > use as is > > ~|Employee ID|First Name|Last Name |Phone Nr > ~|-------- --|----- ----|---- ------|----- -- > ~|1234 |Johannes |Doe |555-1212 > ~|2345 |Betty Sue|Doaks |555-1213 > ~|3456 |Ferdinando|Quattlebaum|555-1214 > ~|4567 |Sue Ellen|Van Der Lin|555-1215 > >> > > I disagree...somewhat. Some of this has to be couched in the > context of the make-doc discussion. If I were just typing a > document, I would probably do something like this: > > Table 1. > --------------------------------------- > NAME PHONE E-MAIL > --------------------------------------- > Bob Jones 000.0000 [Bob--Jones--com] > Mary Smith 111.1111 [Mary--Smith--com] > > Can make-doc make sense of this? Let's see...blank line, then > a line is indented and starts with the word "table", then a > line of dashes, words in all caps (make note of what character > column they fall on), ... >
Now you're just making up another set of rules on the fly, with yet more opportunities for a poor human to make an error (typo, failing to remember a rule, etc.) What happens if the typist doesn't remember to put NAME, PHONE, and E-MAIL in all caps? Rules are still rules, regardless of which keys on the keyboard they require!
> I believe the standard is just to leave the cell empty or to > put an em dash in the cell. In our case, the latter seems the > obvious choice for ease of implementation. >
Leaving it empty creates ambiguous runs of whitespace unless there is a column delimiter. As for the alternative, my ASCII keyboard doesn't have an "em-dash" key. Requiring something (doubled hyphens, for example) to represent an empty cell is certainly plausable, but again constitutes yet another rule whose violation might have consequences. More rules again.
> << IOW, simplicity is achieved by controlling the rate at > which features are discussed with the user, not by making > a limiting choice that actually causes subtle problems > (like remembering to count spaces). >> > > This logic is somewhat counterfeit... >
You're entitled to your opinion, but I've actually deployed a system using this kind of approach and have had no problems with it (nor its users) to date.
> << > Now, since you want to take up keystroke counting next, answer > the same question with this version. > > ~|Widget A|Widget B|Widget C > ~|50%|10%|3% > ~|500,000|100,000|3,000 > > How many keystrokes did we save by not fooling ourselves into > believing that we had to make the columns line up? By using > an explicit delimiter, the typist gets to choose whether to > do so or not, and to choose to save the maximum number of > keystrokes if she/he wishes to do so. > >> > > Another valid case. Now, my *goal* is not to save keystrokes, > that just comes along as a happy side-effect. I originally did > some delimited layouts to post, showing the very issue you > mention: > > a|b|c|d|e|f|g > 12|23|34|45|56|67|78 > acb|def|ghi|jkl|mno|pqr|stu > $12.00|$24.50|$37.75|$1.00|$0.50|$0.25|$0.01 > > I didn't include them in my original post because this is so > horrific, from a layout perspective, that I can't imagine > anyone even considering it as an option and I didn't think it > was fair to hit you with it. :) >
Since we're confessing... ;-) I had originally planned on including the following related issue (but had writtent too much already). Those who have data in spreadsheet or database format can easily export it to delimited text, then cut and paste the data into their document. Why should they have to do all of the manual effort to replace the explicit delimiters with varying amounts of whitespace just to make the columns line up? That's what we employ computers for! They can count columns and spit out "tidy text", HTML, RTF, PDF, TeX, or whatever, and save us the tedium.
> For me, and I could be alone in this, creating delimited tables > using ASCII chars as we are discussing, always has me playing > the back and forth game to get my delimiters to line up, then > that moves my data, so I have to readjust that, etc. Could just > be that I haven't been forced to come up with a clever solution > to my problem since I can avoid it. >
The solution is to have the program include a "tidy text" option to deal with all of that time-wasting character shuffling. Then you can 1) Type (or cut-and-past) minimal with minimal effort. 2) Convert to tidy-text for those (few?) cases where text is really the right medium. 3) Convert to HTML (etc.) for the other cases. ... and in all cases, let the computer do the tedious work of shuffling characters and counting columns.
> << > Literate humans (whether "users" or not) > are completely familiar with using punctuation as a way of > showing the structure of text. Periods at the ends of > sentences, commas between elements of a list, horizontal rules > between sections of a document, etc. are all well within their > comfort zone. > >> > > Agreed. Those are conventions we are taught. Using a pipe symbol > as a delimiter between pieces of text is not. This is where we > diverge in our view as I consider this to be a computer/programmer's > convention and not a normal language/grammar convention. >
I think anyone who understands that commas separate words or phrases in a list can learn to use a column separator in a few seconds. They already understand the concept of delimiter; I'm just reusing it. That seems to be where we differ.
> Thanks for a fabulous discussion! Whatever the result, I'm always > energized by this kind of constructive discourse (I love that word). :) >
Thank you! I appreciate the exchange of ideas! -jn- -- ; Joel Neely [joel--neely--fedex--com] 901-263-4460 38017/HKA/9677 REBOL [] foreach [order string] sort/skip reduce [ true "!" false head reverse "rekcah" none "REBOL " prin "Just " "another " ] 2 [prin string] print ""