Mailing List Archive: Re: make-doc-pro: how to handle tables?

[REBOL] Re: make-doc-pro: how to handle tables?

From: greggirwin:starband at: 23-Sep-2001 11:18


Hi Joel,

Great response!

<<
You seem to be glossing over to one of the points I raised:

    When using whitespace one must address the issue of
    how much whitespace is considered a column break (since it
    also may appear *within* a column, e.g. between words).

When you say "multi-space" you seem to imply more than one,
but without actually saying what you mean.  How about two?
What does that do to the poor user who accidentally bounces
the space bar at a point where a single space between words
(in one column) was intended?
>>

Yes, I meant more than one. Spacing between columns should
appear *visibly* different from spacing within a cell. If I
type in a table for other people to read, I would naturally
add extra space between columns. If I didn't, that would be
a poor design choice on my part and people could very easily
be confused by it.

<<
    Employee ID First Name Last Name   Phone Nr
    -------- -- ----- ---- ---- ------ ----- --
    1234        John       Doe         555-1212
    2345        Jane       Doaks       555-1213
    3456        Ferdinando Quattlebaum 555-1214
    4567        Mary Ellen Van Der Lin 555-1215
>>

This is a great example! I would consider this to be a bad
table layout because it *is* confusing. Why would you build
such a tightly spaced table (outside of being an example
here)? If I use tabs, I get a wider spacing than I would
like for this table, but it clearly delineates the columns.

Employee ID		First Name		Last Name		Phone Nr
-------- --		----- ----		---- ------		----- --
1234			John			Doe			555-1212
2345			Jane			Doaks			555-1213
3456			Ferdinando		Quattlebaum		555-1214
4567			Mary Ellen		Van Der Lin		555-1215

<< OBTW, tabs are of absolutely no use in solving this problem.
If, for example, an item ends on column 14 (1-origin), then
keying a tab only takes one to column 16, which is visually
indistinguishable from the result of typing a single space
in column 15. >>

Right! Yes! Absolutely! So, naturally, you would type *another*
tab to add space before the start of the next column. Type up
a table of text in your favorite word processor. How do you do
it?

<<
> Whitespace *is* visual feedback.
>
Bah!  Humbug!  Horsefeathers!  See the proofreading exercise
above.  Counting nonprinting characters to figure out *what*
*kind* of delimiter (between words *in* a column or between
columns) is an awful burden to lay on a poor human (technical
or not!).
>>

Right again! We're not counting characters at all, we never do.
What we're doing is creating "visual groupings" of data in
columns. They must be visually distinguishable for us to make
sense of them. If they are laid out so that they might confuse
a human reader, then I would expect a parser to get confused
as well. Now, a human can take a few extra seconds to try and
make sense of the data, as I did with your first example table.
Can a program do the same thing? Sure. As you mention, this is
no easy task, and beyond what make-doc-pro should be expected to
do, but a program should be able to scan the data and take a
few extra seconds to try and determine so heuristics that make
the most sense of things. Worst case, it could ask. "Hey, are
there 4 columns in this table (as in the example I'm displaying
to you now in my disruptive dialog box)?"

<< Bear in mind, as you look at the above examples, that most of
us (I assume that you are included in this) read our mail via
a user interface that uses a monospaced font.  Try changing
your default mail client font to Helvetica, Palatino, or Times
Roman and see how much fun you have counting spaces! >>

Right once more! I think people use tabs to build tables. If you
try to build tables using spaces with proportional fonts you're
setting yourself up for an ulcer. :)

<<
> Do we need delimiters, beyond whitespace, in our REBOL code?
>
We certainly do.  Including  [  ]  (  )  {  }  and  " .
>>

Sorry, my meaning wasn't clear. Here's an example of what I meant.
Using only whitespace, we can type this:

	emit: func [data] [append out reduce data]

Using another delimiter (|), we get this:

	emit:|func|[data]|[append|out|reduce|data]

That's the simple case I wanted to make.

<<
We're talking about tables here, where structure (and how to
represent it in a simple, obvious, non-error-prone way) is the
very heart of the issue.  REBOL certainly uses non-whitespace
delimiters when structure is involved!  And (in my experience)
the "non-error-prone" issue in the long run is the most
important one.
>>

Agreed. However, as you point out, we're talking about tables
here. If REBOL only dealt with tables, how much of its already
minimal syntax would go away? It's much more complex, which is
why it has more delimiters and operators. I think Carl eliminated
everything he possibly could while still achieving his goals.

<<
In my book things such as

  \table
  /table

or underlining with hyphens or equal signs are just as much
syntax
 that has to be learned and understood for effective
use as is

    ~|Employee ID|First Name|Last Name  |Phone Nr
    ~|-------- --|----- ----|---- ------|----- --
    ~|1234       |Johannes  |Doe        |555-1212
    ~|2345       |Betty  Sue|Doaks      |555-1213
    ~|3456       |Ferdinando|Quattlebaum|555-1214
    ~|4567       |Sue  Ellen|Van Der Lin|555-1215
>>

I disagree...somewhat. Some of this has to be couched in the
context of the make-doc discussion. If I were just typing a
document, I would probably do something like this:

			Table 1.
	---------------------------------------
	NAME		PHONE		E-MAIL
	---------------------------------------
	Bob Jones	000.0000	[Bob--Jones--com]
	Mary Smith	111.1111	[Mary--Smith--com]

Can make-doc make sense of this? Let's see...blank line, then
a line is indented and starts with the word "table", then a
line of dashes, words in all caps (make note of what character
column they fall on), another line of dashes (if last line
wasn't all caps then we'll pretend it was), more lines of
stuff, and finally another blank line.

That's the syntax (well, roughly anyway <g>) I want because I
can send this table to *anyone*, make-doc'd or not, and they
can understand it. Even if I have an extra space or two
somewhere. :)

<< Now imagine that the list is not something as obviously
interpretable as names and phone numbers, but is an inventory
list with product codes, warehouse IDs, rack/shelf locations,
and various quantities (on hand, on order, backordered, etc.)
some of which may be omitted!  What if some of our people have
pagers and some do not?  Nicknames or not? >>

Good points all. I contend that whitespace is easily the
best delimiter in this case as you may have other, non-standard,
characters contained in product codes and such.

Regarding omitted items, the whitespace defense team calls to
the stand; The Chicago Manual of Style (well, not really <g>).

I believe the standard is just to leave the cell empty or to
put an em dash in the cell. In our case, the latter seems the
obvious choice for ease of implementation.

<<
    ~|Emp#|First Name|Last Name  |Nickname|Pager Nr|Phone Nr
    ~|----|----- ----|---- ------|--------|----- --|----- --
    ~|1234|Johannes  |Doe        |Jake    |888-1001|555-1212
    ~|2345|Betty  Sue|Doaks      |        |        |555-1213
    ~|3456|Ferdinando|Quattlebaum|Ferdy   |        |555-1214
    ~|4567|Sue  Ellen|Van Der Lin|        |888-1002|555-1215

That's a perfectly readable chunk of plain ASCII above, and
can trivially be parsed into a perfectly legal HTML table.
How does the old "whitespace is all we need" strategy deal
with:

    Emp# First Name Last Name   Nickname Pager Nr Phone Nr
    ---- ----- ---- ---- ------ -------- ----- -- ----- --
    1234 Johannes   Doe         Jake     888-1001 555-1212
    2345 Betty  Sue Doaks                         555-1213
    3456 Ferdinando Quattlebaum Ferdy             555-1214
    4567 Sue  Ellen Van Der Lin          888-1002 555-1215
>>

It deals with it the same way a human would, though it's
obviously not as smart. It says "That's a really bad table
layout. What were you thinking? I can't make sense of that,
and neither will anyone else." :)

You don't have to agree with my stance on this, to be sure
it's not conventional. While I agree that the first one is
trivially parsed
 by a machine, and also "parseable" by a
human, it is not easily readable. I can't take it in at a
glance, which is what tables are supposed to help us do,
right?

<< IOW, simplicity is achieved by controlling the rate at
which features are discussed with the user, not by making
a limiting choice that actually causes subtle problems
(like remembering to count spaces). >>

This logic is somewhat counterfeit. A particular user may
not know anything about advanced features, but that doesn't
prevent them from getting a document containing them, which
means they then have to understand them to understand that
document. It could be argued that people should only care
about distributing what make-doc spits out, but I like the
idea of plain text distribution.

<< The examples I provided earlier more fully discuss what I
meant by "ambiguous".  The "widget" example (as typed) doesn't
address those issues.  Again, do you have any trouble reading
this table?

    Widget A    Widget  B   Widget C
         50%         10%         3%
    500,000     100,000      3,000
>>

None what-so-ever. 3 columns. Perfectly clear. Now, you're
going to tell me that there are actually 4 columns because
of the extra space in "Widget  B". And you'd be right. No
problem says I, it's easily fixed for make-doc generation
and, for human consumption, it doesn't need to be fixed.
That's my preference, though it may not be yours.

It's going to be tough as long as each of us can come up with
artificial examples that support our respective cases.

Could some othe people send in some sample tables? Let's get
as many as we can and see if that leads in a particular direction.

<<
Now, since you want to take up keystroke counting next, answer
the same question with this version.

    ~|Widget A|Widget B|Widget C
    ~|50%|10%|3%
    ~|500,000|100,000|3,000

How many keystrokes did we save by not fooling ourselves into
believing that we had to make the columns line up?  By using
an explicit delimiter, the typist gets to choose whether to
do so or not, and to choose to save the maximum number of
keystrokes if she/he wishes to do so.
>>

Another valid case. Now, my *goal* is not to save keystrokes,
that just comes along as a happy side-effect. I originally did
some delimited layouts to post, showing the very issue you
mention:

a|b|c|d|e|f|g
12|23|34|45|56|67|78
acb|def|ghi|jkl|mno|pqr|stu
$12.00|$24.50|$37.75|$1.00|$0.50|$0.25|$0.01

I didn't include them in my original post because this is so
horrific, from a layout perspective, that I can't imagine
anyone even considering it as an option and I didn't think it
was fair to hit you with it. :)

We can say that people should be free to format their data
however they want, freedom of expression and all that, but
I also think it's important to encourage good data hygiene.

RE: Keystrokes and table formatting

<< It depends on whether I (as the typist) make the choice to
create tidy-looking source text or just get the data in with
the fewest keystrokes. >>

I notice that none of the example tables you've posted to
support the case for delimiters used the keystroke saving
mode.

For me, and I could be alone in this, creating delimited tables
using ASCII chars as we are discussing, always has me playing
the back and forth game to get my delimiters to line up, then
that moves my data, so I have to readjust that, etc. Could just
be that I haven't been forced to come up with a clever solution
to my problem since I can avoid it.

<<
Literate humans (whether "users" or not)
are completely familiar with using punctuation as a way of
showing the structure of text.  Periods at the ends of
sentences, commas between elements of a list, horizontal rules
between sections of a document, etc. are all well within their
comfort zone.
>>

Agreed. Those are conventions we are taught. Using a pipe symbol
as a delimiter between pieces of text is not. This is where we
diverge in our view as I consider this to be a computer/programmer's
convention and not a normal language/grammar convention.

<< Which delimiter? >>

Sorry for the confusion, I have been using the pipe symbol(|)
as the basis for this discussion.

<< If you can choose for yourself, this is a total red herring! >>

But that brings us back to the complexification of variable syntax!

<<
    ~;Emp#;First Name;Last Name  ;Nickname;Pager Nr;Phone Nr
    ~;----;----- ----;---- ------;--------;----- --;----- --
    ~;1234;Johannes  ;Doe        ;Jake    ;888-1001;555-1212
    ~;2345;Betty  Sue;Doaks      ;        ;        ;555-1213
    ~;3456;Ferdinando;Quattlebaum;Ferdy   ;        ;555-1214
    ~;4567;Sue  Ellen;Van Der Lin;        ;888-1002;555-1215
>>

Now don't tell me *that's* easily parsed by humans. <thrust parry
thrust jab> :)

<<
> From my perspective, we're dealing with text tables, not
> data tables.
>
What does that mean?  I really have no clue!  (Especially
since your next sentence talks about reading "the data"!)
>>

Sorry, again, for the confusion. I meant to draw the distinction
between the tabular presentation of data for human consumption
versus the tabular presentation of data for computer processing
(i.e. delimited data).

<<
~| Emp# | First Name | Last Name   | Nickname | Pager Nr | Phone Nr
~| ---- | ----- ---- | ---- ------ | -------- | ----- -- | ----- --
~| 1234 | Johannes   | Doe         | Jake     | 888-1001 | 555-1212
~| 2345 | Betty  Sue | Doaks       |          |          | 555-1213
~| 3456 | Ferdinando | Quattlebaum | Ferdy    |          | 555-1214
~| 4567 | Sue  Ellen | Van Der Lin |          | 888-1002 | 555-1215

I can read the above quite clearly.
>>

And so can I. Now, if you remove the delimiters...

Emp#  First Name  Last Name    Nickname  Pager Nr  Phone Nr
----  ----- ----  ---- ------  --------  ----- --  ----- --
1234  Johannes    Doe          Jake      888-1001  555-1212
2345  Betty  Sue  Doaks                            555-1213
3456  Ferdinando  Quattlebaum  Ferdy               555-1214
4567  Sue  Ellen  Van Der Lin            888-1002  555-1215

I can still read it just as clearly, if not more-so. The catch
here is that make-doc may not be able to read it as clearly.
Hmmm, force a fixed-pitch font and treat it as preformatted
text in cases of confusion?

<< No more so that the borders in an HTML table!  Most people don't
bother to say ... border="0" ..., so their tables have vertical
and horizontal rules all over the place.

You make a great point by saying "Most people don't bother..."
You're right. People will often do things in the most economical
way possible and live with the results if they aren't oppressive.

Press abandoned vertical rules as a standard feature of
tables in the books and journals it published. More than
two decades of this austerity have demonstrated that
banishing vertical rules has not decreased the clarity of
well-organized tables and that it has, on the other hand,
increased their attractiveness.
 (Chicago Manual of Style,
14th edition, pg 410)

<< All we're talking about here is letting someone indicate in a
simple, obvious, non-error-prone way where he/she wants those
boundaries to appear. >>

I'll nit-pick here. We're not "letting" them, we're "forcing" them.

<< I consider the effort of counting spaces to be a very
subtle and error-prone demand, and therefore extraneous. >>

So don't count spaces, just make things look right. :)

<<
I also disagree, at least with the implication that anything
other than plain text is "extraneous".  We're in a trade-off
zone where we're balancing the added power of structured
formatting (such as tables) against the cognitive and typing
efforts required to access such power.
>>

I couldn't agree more. In the context of a TeX/TBL dialect,
the whitespace approach is probably not the right one. My
opinion in this matter is based on my view that simpler is
better and less is more. I'm willing to give up 99% of the
power in return for a 50% effort reduction.

<< OTOH, if you really want