Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: make-doc-pro: how to handle tables?

From: joel:neely:fedex at: 22-Sep-2001 19:14

Hi, Gregg, Gregg Irwin wrote: ...
> Let me know what you think. >
... OK. You asked for it. ;-)
> Joel, Robert, et al > > I have to disagree with you guys. I'll pick a couple points of > contention and then give some thoughts and excercises. > > << not so experienced users don't understand the difference > between usi[n]g space, tab etc. >> > > And they shouldn't have to. >
That one wasn't mine, so I'll pass.
> Now, there is going to be the challenge on the parsing side of > making the distinction, but even that shouldn't be too tough > using alternation of tab or multi-space. >
You seem to be glossing over to one of the points I raised: When using whitespace one must address the issue of how much whitespace is considered a column break (since it also may appear *within* a column, e.g. between words). When you say "multi-space" you seem to imply more than one, but without actually saying what you mean. How about two? What does that do to the poor user who accidentally bounces the space bar at a point where a single space between words (in one column) was intended?
> Do these two lines look the same to you? Do you > care if one uses tabs and the other spaces? > > Tab Spaces > Tab Spaces >
Same issue. Look at these lines: Employee ID First Name Last Name Phone Nr -------- -- ----- ---- ---- ------ ----- -- 1234 John Doe 555-1212 2345 Jane Doaks 555-1213 3456 Ferdinando Quattlebaum 555-1214 4567 Mary Ellen Van Der Lin 555-1215 Being an intelligent human, you probably guessed that there were four columns, but you had to look at the entire context to figure that out. IOW, from the first two lines (or the last two), it would not be obvious MERELY ON SYNTACTICAL GROUNDS which blanks (if any) were column delimiters instead of data within a column. Using multiple blanks as column delimiters requires humans to count spaces (to make sure that we don't have an accidental extra blank) and to type extra spaces (which take more effort) just to be sure. Too fragile! Consider this variation of the above table: Employee ID First Name Last Name Phone Nr -------- -- ----- ---- ---- ------ ----- -- 1234 Johannes Doe 555-1212 2345 Betty Sue Doaks 555-1213 3456 Ferdinando Quattlebaum 555-1214 4567 Sue Ellen Van Der Lin 555-1215 How hard is that to proofread? Are you sure there are only four columns? OBTW, tabs are of absolutely no use in solving this problem. If, for example, an item ends on column 14 (1-origin), then keying a tab only takes one to column 16, which is visually indistinguishable from the result of typing a single space in column 15.
> << These people need a visual feedback >> > > Whitespace *is* visual feedback. >
Bah! Humbug! Horsefeathers! See the proofreading exercise above. Counting nonprinting characters to figure out *what* *kind* of delimiter (between words *in* a column or between columns) is an awful burden to lay on a poor human (technical or not!). Bear in mind, as you look at the above examples, that most of us (I assume that you are included in this) read our mail via a user interface that uses a monospaced font. Try changing your default mail client font to Helvetica, Palatino, or Times Roman and see how much fun you have counting spaces! And, yes, I know of non-technical end users that type text in uSoft Word and then "Save as..." plain text, because they don't know any other way to edit text. Unless they bother to change the font to Lucida Console, Courier, etc. or know how to change their default font (even less likely) they'll be looking at the text in a proportional font, which makes counting spaces a REAL PAIN.
> Do we need delimiters, beyond whitespace, in our REBOL code? >
We certainly do. Including [ ] ( ) { } and " . Note that the reason we need them is to show structure; as long as you're just typing a long trivial expression a: b + c + d + e + f + g ;; ... they aren't too critical. However, try using IF, WHILE, FUNC, etc. without any delimiters but whitespace, please! We're talking about tables here, where structure (and how to represent it in a simple, obvious, non-error-prone way) is the very heart of the issue. REBOL certainly uses non-whitespace delimiters when structure is involved! And (in my experience) the "non-error-prone" issue in the long run is the most important one. After the first time someone struggles to find the extra/missing blank that made his nice long table go all goofy, he/she will usually realize that inserting commas, slashes, or vertical bars, provides a real ROI.
> Non-programmers are easily confused and frustrated by any > kind of "syntax" in my experience. If the goal is to keep > the make-doc format as human-friendly as possible, all > extraneous syntax should be kept out of it. >
In my book things such as \table /table or underlining with hyphens or equal signs are just as much syntax that has to be learned and understood for effective use as is ~|Employee ID|First Name|Last Name |Phone Nr ~|-------- --|----- ----|---- ------|----- -- ~|1234 |Johannes |Doe |555-1212 ~|2345 |Betty Sue|Doaks |555-1213 ~|3456 |Ferdinando|Quattlebaum|555-1214 ~|4567 |Sue Ellen|Van Der Lin|555-1215 Now imagine that the list is not something as obviously interpretable as names and phone numbers, but is an inventory list with product codes, warehouse IDs, rack/shelf locations, and various quantities (on hand, on order, backordered, etc.) some of which may be omitted! What if some of our people have pagers and some do not? Nicknames or not? ~|Emp#|First Name|Last Name |Nickname|Pager Nr|Phone Nr ~|----|----- ----|---- ------|--------|----- --|----- -- ~|1234|Johannes |Doe |Jake |888-1001|555-1212 ~|2345|Betty Sue|Doaks | | |555-1213 ~|3456|Ferdinando|Quattlebaum|Ferdy | |555-1214 ~|4567|Sue Ellen|Van Der Lin| |888-1002|555-1215 That's a perfectly readable chunk of plain ASCII above, and can trivially be parsed into a perfectly legal HTML table. How does the old "whitespace is all we need" strategy deal with: Emp# First Name Last Name Nickname Pager Nr Phone Nr ---- ----- ---- ---- ------ -------- ----- -- ----- -- 1234 Johannes Doe Jake 888-1001 555-1212 2345 Betty Sue Doaks 555-1213 3456 Ferdinando Quattlebaum Ferdy 555-1214 4567 Sue Ellen Van Der Lin 888-1002 555-1215
> Variable syntax is very flexible, but it's even more > confusing than static syntax to regular people. >
Which is why I start off telling them only about a single delimiter (such as vertical bar). I don't make an issue of the option to use other delimiters until someone needs it (at which point he/she is usually delighted to find that their worrisome data can actually be handled easily). IOW, simplicity is achieved by controlling the rate at which features are discussed with the user, not by making a limiting choice that actually causes subtle problems (like remembering to count spaces).
> << Whitespace is also ambiguous (even for more experienced > eyes, IMHO). >> > > Is this table data ambiguous to you? (providing it doesn't > get munged in the mail :)) > > Widget A Widget B Widget C > 50% 10% 3% > 500,000 100,000 3,000 > > Does this make the data more accessible? > > Widget A|Widget B|Widget C > 50% |10% |3% > 500,000 |100,000 |3,000 >
The examples I provided earlier more fully discuss what I meant by "ambiguous". The "widget" example (as typed) doesn't address those issues. Again, do you have any trouble reading this table? Widget A Widget B Widget C 50% 10% 3% 500,000 100,000 3,000 How many columns are in each row in THIS version? Now answer the same question with the following one. ~| Widget A | Widget B | Widget C ~| 50% | 10% | 3% ~| 500,000 | 100,000 | 3,000 Now, since you want to take up keystroke counting next, answer the same question with this version. ~|Widget A|Widget B|Widget C ~|50%|10%|3% ~|500,000|100,000|3,000 How many keystrokes did we save by not fooling ourselves into believing that we had to make the columns line up? By using an explicit delimiter, the typist gets to choose whether to do so or not, and to choose to save the maximum number of keystrokes if she/he wishes to do so.
> Using the two example tables above, try this little experiment. > Count your keystrokes as you go and time yourself if you feel > so inclined. > > 1. Add a new row with the following data for each column: > Bob Jones > Ford Prefect > Mary Smith >
35, if I'm in keystroke-saving mode. ~|Bob Jones|Ford Prefect|Mary Smith 12345678901234567890123456789012345
> 2. Change Ford Prefect's name to "Ford Prefect III" >
Only 4 more data keystrokes are required (" III") if I choose to optimize in favor of saving keystrokes. (How many keys must be pressed -- or mice must be clicked -- to position the cursor in the right place to perform the insertion is clearly dependent on one's choice of text editor...)
> What do your tables look like now? >
It depends on whether I (as the typist) make the choice to create tidy-looking source text or just get the data in with the fewest keystrokes.
> How many keystrokes did it take you for each one? (if you > post your results, make sure to note whether data > keystrokes were counted (35 data keystrokes are required)) >
See answers above.
> How long did it take you for each? >
I can't type and look at my watch at the same time. Suffice to say that I type somewhere in the 50-60 wpm range.
> Now for some thoughts... > > If you require a delimiter, it has two impacts: > > 1. The user has to mentally switch from "data" mode to > "format" mode, and back to "data" mode again every time > they enter a delimiter. >
I totally disagree. Literate humans (whether "users" or not) are completely familiar with using punctuation as a way of showing the structure of text. Periods at the ends of sentences, commas between elements of a list, horizontal rules between sections of a document, etc. are all well within their comfort zone.
> 2. The delimiter is a small shifted key, usually located > near the larger Enter and Backspace keys, which are both > potentially destructive; Backspace to the data and Enter > to the location. If they Shift key is missed, you'll > get a backslash instead of a pipe, which is easily overlooked. >
Which delimiter? Which brand of keyboard? Which key is the any key? If you can choose for yourself, this is a total red herring! ~;Emp#;First Name;Last Name ;Nickname;Pager Nr;Phone Nr ~;----;----- ----;---- ------;--------;----- --;----- -- ~;1234;Johannes ;Doe ;Jake ;888-1001;555-1212 ~;2345;Betty Sue;Doaks ; ; ;555-1213 ~;3456;Ferdinando;Quattlebaum;Ferdy ; ;555-1214 ~;4567;Sue Ellen;Van Der Lin; ;888-1002;555-1215
> From my perspective, we're dealing with text tables, not > data tables. >
What does that mean? I really have no clue! (Especially since your next sentence talks about reading "the data"!)
> I should be able to read the data clearly whether it's been > formatted in make-doc or whether it's just plain text. >
~| Emp# | First Name | Last Name | Nickname | Pager Nr | Phone Nr ~| ---- | ----- ---- | ---- ------ | -------- | ----- -- | ----- -- ~| 1234 | Johannes | Doe | Jake | 888-1001 | 555-1212 ~| 2345 | Betty Sue | Doaks | | | 555-1213 ~| 3456 | Ferdinando | Quattlebaum | Ferdy | | 555-1214 ~| 4567 | Sue Ellen | Van Der Lin | | 888-1002 | 555-1215 I can read the above quite clearly.
> Using a special delimiter adds an artifical element to the mix. >
No more so that the borders in an HTML table! Most people don't bother to say ... border="0" ..., so their tables have vertical and horizontal rules all over the place. All we're talking about here is letting someone indicate in a simple, obvious, non-error-prone way where he/she wants those boundaries to appear.
> I certainly think we could come up with a cool table formatting > dialect to surpass TBL and rival TeX, but that's not the goal > right now so I think we should strive to remove all extraneous > demands placed on the user. >
I agree. I consider the effort of counting spaces to be a very subtle and error-prone demand, and therefore extraneous. I also disagree, at least with the implication that anything other than plain text is "extraneous". We're in a trade-off zone where we're balancing the added power of structured formatting (such as tables) against the cognitive and typing efforts required to access such power. Having been at this game since the late 60's, I've seen more than my share of line-printer-generated documents that used such devices as +------------+-----------------------+---------------------+ | Style | Pros | Cons | +============+=======================+=====================+ | Whitespace | Easier for hunt-and | Counting whitespace | | (multiple) | -peck typists who | is difficult and | | | don't know the | error-prone for | | | keyboard? | | +------------+-----------------------+---------------------+ | Explicit | Less ambiguous, both | One minute required | | delimiters | to human readers and | to explain the rule | | | text processing | to typist; no time | | | software | required for reader | +------------+-----------------------+---------------------+ | ASCII Art, | About as well as one | Very tedious to | | such as | can do with plain | create and maintain | | this table | (monospaced) text | | +------------+-----------------------+---------------------+ | HTML, Tex, | Tremendous power and | Tremendous learning | | LaTex, or | sophisticated control | curve and effort | | SGML | over resulting docs | to use | +------------+-----------------------+---------------------+ Why did people do such things? Because there was a perceived value in enhancing the appearance of the text. One could certainly argue (and some typographers do) that the borders in such tables are extraneous and bad. However, most people of my acquaintance are accustomed to such extraneous decoration of the content and are willing to invest a small effort to achieve it. OTOH, if you really want to be consistent with the philosophy of letting the human type for human consumption and requiring the formatting program to figure out what was meant, I'd love to see some code that can handle the following (exactly as it appears below, of course). Back in the day, I spent quite a bit of time trying to come up with a bit of AI that could take a flat file or printout image such as the following and infer where the columns were intended, whether each column was to be left-justfied, right-justified, or centered, and what type of data should appear in each. It's non-trivial IMHO, but YMMV. Emp# First Name Last Name Nickname Pager Nr Phone Number ==== ===== ==== ==== ====== ======== ===== == ===== ====== 12 Johannes Doe Jake 888-1001 555-1212 3456 Ferdinando Quattlebaum Ferdy 800-555-1214 234 Betty Sue Doaks 555-1213 4567 Sue Ellen Van Der Lin 888-1002 888-555-1215 Thanks for listening/reading (if you get this far! ;-) -jn- -- You can love your job, but don't expect it to love you back. -- Aaron Watters joel;dot;neely;at;FIX;PUNCTUATION;fedex;d