Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: make-doc-pro: how to handle tables?

From: greggirwin:starband at: 23-Sep-2001 11:18

Hi Joel, Great response! << You seem to be glossing over to one of the points I raised: When using whitespace one must address the issue of how much whitespace is considered a column break (since it also may appear *within* a column, e.g. between words). When you say "multi-space" you seem to imply more than one, but without actually saying what you mean. How about two? What does that do to the poor user who accidentally bounces the space bar at a point where a single space between words (in one column) was intended?
>>
Yes, I meant more than one. Spacing between columns should appear *visibly* different from spacing within a cell. If I type in a table for other people to read, I would naturally add extra space between columns. If I didn't, that would be a poor design choice on my part and people could very easily be confused by it. << Employee ID First Name Last Name Phone Nr -------- -- ----- ---- ---- ------ ----- -- 1234 John Doe 555-1212 2345 Jane Doaks 555-1213 3456 Ferdinando Quattlebaum 555-1214 4567 Mary Ellen Van Der Lin 555-1215
>>
This is a great example! I would consider this to be a bad table layout because it *is* confusing. Why would you build such a tightly spaced table (outside of being an example here)? If I use tabs, I get a wider spacing than I would like for this table, but it clearly delineates the columns. Employee ID First Name Last Name Phone Nr -------- -- ----- ---- ---- ------ ----- -- 1234 John Doe 555-1212 2345 Jane Doaks 555-1213 3456 Ferdinando Quattlebaum 555-1214 4567 Mary Ellen Van Der Lin 555-1215 << OBTW, tabs are of absolutely no use in solving this problem. If, for example, an item ends on column 14 (1-origin), then keying a tab only takes one to column 16, which is visually indistinguishable from the result of typing a single space in column 15. >> Right! Yes! Absolutely! So, naturally, you would type *another* tab to add space before the start of the next column. Type up a table of text in your favorite word processor. How do you do it? <<
> Whitespace *is* visual feedback. >
Bah! Humbug! Horsefeathers! See the proofreading exercise above. Counting nonprinting characters to figure out *what* *kind* of delimiter (between words *in* a column or between columns) is an awful burden to lay on a poor human (technical or not!).
>>
Right again! We're not counting characters at all, we never do. What we're doing is creating "visual groupings" of data in columns. They must be visually distinguishable for us to make sense of them. If they are laid out so that they might confuse a human reader, then I would expect a parser to get confused as well. Now, a human can take a few extra seconds to try and make sense of the data, as I did with your first example table. Can a program do the same thing? Sure. As you mention, this is no easy task, and beyond what make-doc-pro should be expected to do, but a program should be able to scan the data and take a few extra seconds to try and determine so heuristics that make the most sense of things. Worst case, it could ask. "Hey, are there 4 columns in this table (as in the example I'm displaying to you now in my disruptive dialog box)?" << Bear in mind, as you look at the above examples, that most of us (I assume that you are included in this) read our mail via a user interface that uses a monospaced font. Try changing your default mail client font to Helvetica, Palatino, or Times Roman and see how much fun you have counting spaces! >> Right once more! I think people use tabs to build tables. If you try to build tables using spaces with proportional fonts you're setting yourself up for an ulcer. :) <<
> Do we need delimiters, beyond whitespace, in our REBOL code? >
We certainly do. Including [ ] ( ) { } and " .
>>
Sorry, my meaning wasn't clear. Here's an example of what I meant. Using only whitespace, we can type this: emit: func [data] [append out reduce data] Using another delimiter (|), we get this: emit:|func|[data]|[append|out|reduce|data] That's the simple case I wanted to make. << We're talking about tables here, where structure (and how to represent it in a simple, obvious, non-error-prone way) is the very heart of the issue. REBOL certainly uses non-whitespace delimiters when structure is involved! And (in my experience) the "non-error-prone" issue in the long run is the most important one.
>>
Agreed. However, as you point out, we're talking about tables here. If REBOL only dealt with tables, how much of its already minimal syntax would go away? It's much more complex, which is why it has more delimiters and operators. I think Carl eliminated everything he possibly could while still achieving his goals. << In my book things such as \table /table or underlining with hyphens or equal signs are just as much syntax that has to be learned and understood for effective use as is ~|Employee ID|First Name|Last Name |Phone Nr ~|-------- --|----- ----|---- ------|----- -- ~|1234 |Johannes |Doe |555-1212 ~|2345 |Betty Sue|Doaks |555-1213 ~|3456 |Ferdinando|Quattlebaum|555-1214 ~|4567 |Sue Ellen|Van Der Lin|555-1215
>>
I disagree...somewhat. Some of this has to be couched in the context of the make-doc discussion. If I were just typing a document, I would probably do something like this: Table 1. --------------------------------------- NAME PHONE E-MAIL --------------------------------------- Bob Jones 000.0000 [Bob--Jones--com] Mary Smith 111.1111 [Mary--Smith--com] Can make-doc make sense of this? Let's see...blank line, then a line is indented and starts with the word "table", then a line of dashes, words in all caps (make note of what character column they fall on), another line of dashes (if last line wasn't all caps then we'll pretend it was), more lines of stuff, and finally another blank line. That's the syntax (well, roughly anyway <g>) I want because I can send this table to *anyone*, make-doc'd or not, and they can understand it. Even if I have an extra space or two somewhere. :) << Now imagine that the list is not something as obviously interpretable as names and phone numbers, but is an inventory list with product codes, warehouse IDs, rack/shelf locations, and various quantities (on hand, on order, backordered, etc.) some of which may be omitted! What if some of our people have pagers and some do not? Nicknames or not? >> Good points all. I contend that whitespace is easily the best delimiter in this case as you may have other, non-standard, characters contained in product codes and such. Regarding omitted items, the whitespace defense team calls to the stand; The Chicago Manual of Style (well, not really <g>). I believe the standard is just to leave the cell empty or to put an em dash in the cell. In our case, the latter seems the obvious choice for ease of implementation. << ~|Emp#|First Name|Last Name |Nickname|Pager Nr|Phone Nr ~|----|----- ----|---- ------|--------|----- --|----- -- ~|1234|Johannes |Doe |Jake |888-1001|555-1212 ~|2345|Betty Sue|Doaks | | |555-1213 ~|3456|Ferdinando|Quattlebaum|Ferdy | |555-1214 ~|4567|Sue Ellen|Van Der Lin| |888-1002|555-1215 That's a perfectly readable chunk of plain ASCII above, and can trivially be parsed into a perfectly legal HTML table. How does the old "whitespace is all we need" strategy deal with: Emp# First Name Last Name Nickname Pager Nr Phone Nr ---- ----- ---- ---- ------ -------- ----- -- ----- -- 1234 Johannes Doe Jake 888-1001 555-1212 2345 Betty Sue Doaks 555-1213 3456 Ferdinando Quattlebaum Ferdy 555-1214 4567 Sue Ellen Van Der Lin 888-1002 555-1215
>>
It deals with it the same way a human would, though it's obviously not as smart. It says "That's a really bad table layout. What were you thinking? I can't make sense of that, and neither will anyone else." :) You don't have to agree with my stance on this, to be sure it's not conventional. While I agree that the first one is trivially parsed by a machine, and also "parseable" by a human, it is not easily readable. I can't take it in at a glance, which is what tables are supposed to help us do, right? << IOW, simplicity is achieved by controlling the rate at which features are discussed with the user, not by making a limiting choice that actually causes subtle problems (like remembering to count spaces). >> This logic is somewhat counterfeit. A particular user may not know anything about advanced features, but that doesn't prevent them from getting a document containing them, which means they then have to understand them to understand that document. It could be argued that people should only care about distributing what make-doc spits out, but I like the idea of plain text distribution. << The examples I provided earlier more fully discuss what I meant by "ambiguous". The "widget" example (as typed) doesn't address those issues. Again, do you have any trouble reading this table? Widget A Widget B Widget C 50% 10% 3% 500,000 100,000 3,000
>>
None what-so-ever. 3 columns. Perfectly clear. Now, you're going to tell me that there are actually 4 columns because of the extra space in "Widget B". And you'd be right. No problem says I, it's easily fixed for make-doc generation and, for human consumption, it doesn't need to be fixed. That's my preference, though it may not be yours. It's going to be tough as long as each of us can come up with artificial examples that support our respective cases. Could some othe people send in some sample tables? Let's get as many as we can and see if that leads in a particular direction. << Now, since you want to take up keystroke counting next, answer the same question with this version. ~|Widget A|Widget B|Widget C ~|50%|10%|3% ~|500,000|100,000|3,000 How many keystrokes did we save by not fooling ourselves into believing that we had to make the columns line up? By using an explicit delimiter, the typist gets to choose whether to do so or not, and to choose to save the maximum number of keystrokes if she/he wishes to do so.
>>
Another valid case. Now, my *goal* is not to save keystrokes, that just comes along as a happy side-effect. I originally did some delimited layouts to post, showing the very issue you mention: a|b|c|d|e|f|g 12|23|34|45|56|67|78 acb|def|ghi|jkl|mno|pqr|stu $12.00|$24.50|$37.75|$1.00|$0.50|$0.25|$0.01 I didn't include them in my original post because this is so horrific, from a layout perspective, that I can't imagine anyone even considering it as an option and I didn't think it was fair to hit you with it. :) We can say that people should be free to format their data however they want, freedom of expression and all that, but I also think it's important to encourage good data hygiene. RE: Keystrokes and table formatting << It depends on whether I (as the typist) make the choice to create tidy-looking source text or just get the data in with the fewest keystrokes. >> I notice that none of the example tables you've posted to support the case for delimiters used the keystroke saving mode. For me, and I could be alone in this, creating delimited tables using ASCII chars as we are discussing, always has me playing the back and forth game to get my delimiters to line up, then that moves my data, so I have to readjust that, etc. Could just be that I haven't been forced to come up with a clever solution to my problem since I can avoid it. << Literate humans (whether "users" or not) are completely familiar with using punctuation as a way of showing the structure of text. Periods at the ends of sentences, commas between elements of a list, horizontal rules between sections of a document, etc. are all well within their comfort zone.
>>
Agreed. Those are conventions we are taught. Using a pipe symbol as a delimiter between pieces of text is not. This is where we diverge in our view as I consider this to be a computer/programmer's convention and not a normal language/grammar convention. << Which delimiter? >> Sorry for the confusion, I have been using the pipe symbol(|) as the basis for this discussion. << If you can choose for yourself, this is a total red herring! >> But that brings us back to the complexification of variable syntax! << ~;Emp#;First Name;Last Name ;Nickname;Pager Nr;Phone Nr ~;----;----- ----;---- ------;--------;----- --;----- -- ~;1234;Johannes ;Doe ;Jake ;888-1001;555-1212 ~;2345;Betty Sue;Doaks ; ; ;555-1213 ~;3456;Ferdinando;Quattlebaum;Ferdy ; ;555-1214 ~;4567;Sue Ellen;Van Der Lin; ;888-1002;555-1215
>>
Now don't tell me *that's* easily parsed by humans. <thrust parry thrust jab> :) <<
> From my perspective, we're dealing with text tables, not > data tables. >
What does that mean? I really have no clue! (Especially since your next sentence talks about reading "the data"!)
>>
Sorry, again, for the confusion. I meant to draw the distinction between the tabular presentation of data for human consumption versus the tabular presentation of data for computer processing (i.e. delimited data). << ~| Emp# | First Name | Last Name | Nickname | Pager Nr | Phone Nr ~| ---- | ----- ---- | ---- ------ | -------- | ----- -- | ----- -- ~| 1234 | Johannes | Doe | Jake | 888-1001 | 555-1212 ~| 2345 | Betty Sue | Doaks | | | 555-1213 ~| 3456 | Ferdinando | Quattlebaum | Ferdy | | 555-1214 ~| 4567 | Sue Ellen | Van Der Lin | | 888-1002 | 555-1215 I can read the above quite clearly.
>>
And so can I. Now, if you remove the delimiters... Emp# First Name Last Name Nickname Pager Nr Phone Nr ---- ----- ---- ---- ------ -------- ----- -- ----- -- 1234 Johannes Doe Jake 888-1001 555-1212 2345 Betty Sue Doaks 555-1213 3456 Ferdinando Quattlebaum Ferdy 555-1214 4567 Sue Ellen Van Der Lin 888-1002 555-1215 I can still read it just as clearly, if not more-so. The catch here is that make-doc may not be able to read it as clearly. Hmmm, force a fixed-pitch font and treat it as preformatted text in cases of confusion? << No more so that the borders in an HTML table! Most people don't bother to say ... border="0" ..., so their tables have vertical and horizontal rules all over the place. You make a great point by saying "Most people don't bother..." You're right. People will often do things in the most economical way possible and live with the results if they aren't oppressive. Press abandoned vertical rules as a standard feature of tables in the books and journals it published. More than two decades of this austerity have demonstrated that banishing vertical rules has not decreased the clarity of well-organized tables and that it has, on the other hand, increased their attractiveness. (Chicago Manual of Style, 14th edition, pg 410) << All we're talking about here is letting someone indicate in a simple, obvious, non-error-prone way where he/she wants those boundaries to appear. >> I'll nit-pick here. We're not "letting" them, we're "forcing" them. << I consider the effort of counting spaces to be a very subtle and error-prone demand, and therefore extraneous. >> So don't count spaces, just make things look right. :) << I also disagree, at least with the implication that anything other than plain text is "extraneous". We're in a trade-off zone where we're balancing the added power of structured formatting (such as tables) against the cognitive and typing efforts required to access such power.
>>
I couldn't agree more. In the context of a TeX/TBL dialect, the whitespace approach is probably not the right one. My opinion in this matter is based on my view that simpler is better and less is more. I'm willing to give up 99% of the power in return for a 50% effort reduction. << OTOH, if you really want