Mailing List Archive: Re: make-doc-pro: how to handle tables?

[REBOL] Re: make-doc-pro: how to handle tables?

From: joel:neely:fedex at: 22-Sep-2001 19:14


Hi, Gregg,

Gregg Irwin wrote:
...
> Let me know what you think.
>
...

OK.  You asked for it.  ;-)

> Joel, Robert, et al
>
> I have to disagree with you guys. I'll pick a couple points of
> contention and then give some thoughts and excercises.
>
> << not so experienced users don't understand the difference
> between usi[n]g space, tab etc. >>
>
> And they shouldn't have to.
>

That one wasn't mine, so I'll pass.

> Now, there is going to be the challenge on the parsing side of
> making the distinction, but even that shouldn't be too tough
> using alternation of tab or multi-space.
>

You seem to be glossing over to one of the points I raised:

    When using whitespace one must address the issue of
    how much whitespace is considered a column break (since it
    also may appear *within* a column, e.g. between words).

When you say "multi-space" you seem to imply more than one,
but without actually saying what you mean.  How about two?
What does that do to the poor user who accidentally bounces
the space bar at a point where a single space between words
(in one column) was intended?

> Do these two lines look the same to you? Do you
> care if one uses tabs and the other spaces?
>
>         Tab             Spaces
>       Tab         Spaces
>

Same issue.  Look at these lines:

    Employee ID First Name Last Name   Phone Nr
    -------- -- ----- ---- ---- ------ ----- --
    1234        John       Doe         555-1212
    2345        Jane       Doaks       555-1213
    3456        Ferdinando Quattlebaum 555-1214
    4567        Mary Ellen Van Der Lin 555-1215

Being an intelligent human, you probably guessed that there
were four columns, but you had to look at the entire context
to figure that out.  IOW, from the first two lines (or the
last two), it would not be obvious MERELY ON SYNTACTICAL
GROUNDS which blanks (if any) were column delimiters instead
of data within a column.

Using multiple blanks as column delimiters requires humans
to count spaces (to make sure that we don't have an accidental
extra blank) and to type extra spaces (which take more effort)
just to be sure.  Too fragile!  Consider this variation of
the above table:

    Employee ID  First Name  Last Name    Phone Nr
    -------- --  ----- ----  ---- ------  ----- --
    1234         Johannes    Doe          555-1212
    2345         Betty  Sue  Doaks        555-1213
    3456         Ferdinando  Quattlebaum  555-1214
    4567         Sue  Ellen  Van Der Lin  555-1215

How hard is that to proofread?  Are you sure there are only
four columns?

OBTW, tabs are of absolutely no use in solving this problem.
If, for example, an item ends on column 14 (1-origin), then
keying a tab only takes one to column 16, which is visually
indistinguishable from the result of typing a single space
in column 15.

> << These people need a visual feedback >>
>
> Whitespace *is* visual feedback.
>

Bah!  Humbug!  Horsefeathers!  See the proofreading exercise
above.  Counting nonprinting characters to figure out *what*
*kind* of delimiter (between words *in* a column or between
columns) is an awful burden to lay on a poor human (technical
or not!).

Bear in mind, as you look at the above examples, that most of
us (I assume that you are included in this) read our mail via
a user interface that uses a monospaced font.  Try changing
your default mail client font to Helvetica, Palatino, or Times
Roman and see how much fun you have counting spaces!

And, yes, I know of non-technical end users that type text in
uSoft Word and then "Save as..." plain text, because they don't
know any other way to edit text.  Unless they bother to change
the font to Lucida Console, Courier, etc. or know how to change
their default font (even less likely) they'll be looking at the
text in a proportional font, which makes counting spaces a REAL
PAIN.

> Do we need delimiters, beyond whitespace, in our REBOL code?
>

We certainly do.  Including  [  ]  (  )  {  }  and  " .

Note that the reason we need them is to show structure; as long
as you're just typing a long trivial expression

    a: b + c + d + e + f + g ;; ...

they aren't too critical.  However, try using IF, WHILE, FUNC,
etc. without any delimiters but whitespace, please!

We're talking about tables here, where structure (and how to
represent it in a simple, obvious, non-error-prone way) is the
very heart of the issue.  REBOL certainly uses non-whitespace
delimiters when structure is involved!  And (in my experience)
the "non-error-prone" issue in the long run is the most
important one.  After the first time someone struggles to
find the extra/missing blank that made his nice long table go
all goofy, he/she will usually realize that inserting commas,
slashes, or vertical bars, provides a real ROI.

> Non-programmers are easily confused and frustrated by any
> kind of "syntax" in my experience. If the goal is to keep
> the make-doc format as human-friendly as possible, all
> extraneous syntax should be kept out of it.
>

In my book things such as

  \table
  /table

or underlining with hyphens or equal signs are just as much
syntax
 that has to be learned and understood for effective
use as is

    ~|Employee ID|First Name|Last Name  |Phone Nr
    ~|-------- --|----- ----|---- ------|----- --
    ~|1234       |Johannes  |Doe        |555-1212
    ~|2345       |Betty  Sue|Doaks      |555-1213
    ~|3456       |Ferdinando|Quattlebaum|555-1214
    ~|4567       |Sue  Ellen|Van Der Lin|555-1215

Now imagine that the list is not something as obviously
interpretable as names and phone numbers, but is an inventory
list with product codes, warehouse IDs, rack/shelf locations,
and various quantities (on hand, on order, backordered, etc.)
some of which may be omitted!  What if some of our people have
pagers and some do not?  Nicknames or not?

    ~|Emp#|First Name|Last Name  |Nickname|Pager Nr|Phone Nr
    ~|----|----- ----|---- ------|--------|----- --|----- --
    ~|1234|Johannes  |Doe        |Jake    |888-1001|555-1212
    ~|2345|Betty  Sue|Doaks      |        |        |555-1213
    ~|3456|Ferdinando|Quattlebaum|Ferdy   |        |555-1214
    ~|4567|Sue  Ellen|Van Der Lin|        |888-1002|555-1215

That's a perfectly readable chunk of plain ASCII above, and
can trivially be parsed into a perfectly legal HTML table.
How does the old "whitespace is all we need" strategy deal
with:

    Emp# First Name Last Name   Nickname Pager Nr Phone Nr
    ---- ----- ---- ---- ------ -------- ----- -- ----- --
    1234 Johannes   Doe         Jake     888-1001 555-1212
    2345 Betty  Sue Doaks                         555-1213
    3456 Ferdinando Quattlebaum Ferdy             555-1214
    4567 Sue  Ellen Van Der Lin          888-1002 555-1215

> Variable syntax is very flexible, but it's even more
> confusing than static syntax to regular people.
>

Which is why I start off telling them only about a single
delimiter (such as vertical bar).  I don't make an issue
of the option to use other delimiters until someone needs
it (at which point he/she is usually delighted to find
that their worrisome data can actually be handled easily).
IOW, simplicity is achieved by controlling the rate at
which features are discussed with the user, not by making
a limiting choice that actually causes subtle problems
(like remembering to count spaces).

> << Whitespace is also ambiguous (even for more experienced
> eyes, IMHO).  >>
>
> Is this table data ambiguous to you? (providing it doesn't
> get munged in the mail :))
>
> Widget A    Widget B    Widget C
> 50%         10%         3%
> 500,000     100,000     3,000
>
> Does this make the data more accessible?
>
> Widget A|Widget B|Widget C
> 50%     |10%     |3%
> 500,000 |100,000 |3,000
>

The examples I provided earlier more fully discuss what I
meant by "ambiguous".  The "widget" example (as typed) doesn't
address those issues.  Again, do you have any trouble reading
this table?

    Widget A    Widget  B   Widget C
         50%         10%         3%
    500,000     100,000      3,000

How many columns are in each row in THIS version?  Now answer
the same question with the following one.

    ~| Widget A | Widget  B | Widget C
    ~|      50% |      10%  |      3%
    ~| 500,000  | 100,000   |  3,000

Now, since you want to take up keystroke counting next, answer
the same question with this version.

    ~|Widget A|Widget B|Widget C
    ~|50%|10%|3%
    ~|500,000|100,000|3,000

How many keystrokes did we save by not fooling ourselves into
believing that we had to make the columns line up?  By using
an explicit delimiter, the typist gets to choose whether to
do so or not, and to choose to save the maximum number of
keystrokes if she/he wishes to do so.

> Using the two example tables above, try this little experiment.
> Count your keystrokes as you go and time yourself if you feel
> so inclined.
>
> 1. Add a new row with the following data for each column:
>         Bob Jones
>         Ford Prefect
>         Mary Smith
>

35, if I'm in keystroke-saving mode.

    ~|Bob Jones|Ford Prefect|Mary Smith
    12345678901234567890123456789012345
> 2. Change Ford Prefect's name to "Ford Prefect III"
>

Only 4 more data keystrokes are required (" III") if I choose
to optimize in favor of saving keystrokes.  (How many keys
must be pressed -- or mice must be clicked -- to position
the cursor in the right place to perform the insertion is
clearly dependent on one's choice of text editor...)

> What do your tables look like now?
>

It depends on whether I (as the typist) make the choice to
create tidy-looking source text or just get the data in with
the fewest keystrokes.

> How many keystrokes did it take you for each one? (if you
> post your results, make sure to note whether data
> keystrokes were counted (35 data keystrokes are required))
>

See answers above.

> How long did it take you for each?
>

I can't type and look at my watch at the same time.  Suffice
to say that I type somewhere in the 50-60 wpm range.

> Now for some thoughts...
>
> If you require a delimiter, it has two impacts:
>
> 1. The user has to mentally switch from "data" mode to
> "format" mode, and back to "data" mode again every time
> they enter a delimiter.
>

I totally disagree.  Literate humans (whether "users" or not)
are completely familiar with using punctuation as a way of
showing the structure of text.  Periods at the ends of
sentences, commas between elements of a list, horizontal rules
between sections of a document, etc. are all well within their
comfort zone.

> 2. The delimiter is a small shifted key, usually located
> near the larger Enter and Backspace keys, which are both
> potentially destructive; Backspace to the data and Enter
> to the location. If they Shift key is missed, you'll
> get a backslash instead of a pipe, which is easily overlooked.
>

Which delimiter?  Which brand of keyboard?  Which key is the
any
 key?  If you can choose for yourself, this is a total
red herring!

    ~;Emp#;First Name;Last Name  ;Nickname;Pager Nr;Phone Nr
    ~;----;----- ----;---- ------;--------;----- --;----- --
    ~;1234;Johannes  ;Doe        ;Jake    ;888-1001;555-1212
    ~;2345;Betty  Sue;Doaks      ;        ;        ;555-1213
    ~;3456;Ferdinando;Quattlebaum;Ferdy   ;        ;555-1214
    ~;4567;Sue  Ellen;Van Der Lin;        ;888-1002;555-1215

> From my perspective, we're dealing with text tables, not
> data tables.
>

What does that mean?  I really have no clue!  (Especially
since your next sentence talks about reading "the data"!)

> I should be able to read the data clearly whether it's been
> formatted in make-doc or whether it's just plain text.
>

~| Emp# | First Name | Last Name   | Nickname | Pager Nr | Phone Nr
~| ---- | ----- ---- | ---- ------ | -------- | ----- -- | ----- --
~| 1234 | Johannes   | Doe         | Jake     | 888-1001 | 555-1212
~| 2345 | Betty  Sue | Doaks       |          |          | 555-1213
~| 3456 | Ferdinando | Quattlebaum | Ferdy    |          | 555-1214
~| 4567 | Sue  Ellen | Van Der Lin |          | 888-1002 | 555-1215

I can read the above quite clearly.

> Using a special delimiter adds an artifical element to the mix.
>

No more so that the borders in an HTML table!  Most people don't
bother to say ... border="0" ..., so their tables have vertical
and horizontal rules all over the place.  All we're talking about
here is letting someone indicate in a simple, obvious,
non-error-prone way where he/she wants those boundaries to appear.

> I certainly think we could come up with a cool table formatting
> dialect to surpass TBL and rival TeX, but that's not the goal
> right now so I think we should strive to remove all extraneous
> demands placed on the user.
>

I agree.  I consider the effort of counting spaces to be a very
subtle and error-prone demand, and therefore extraneous.

I also disagree, at least with the implication that anything
other than plain text is "extraneous".  We're in a trade-off
zone where we're balancing the added power of structured
formatting (such as tables) against the cognitive and typing
efforts required to access such power.  Having been at this
game since the late 60's, I've seen more than my share of
line-printer-generated documents that used such devices as

    +------------+-----------------------+---------------------+
    | Style      | Pros                  | Cons                |
    +============+=======================+=====================+
    | Whitespace | Easier for hunt-and   | Counting whitespace |
    | (multiple) | -peck typists who     | is difficult and    |
    |            | don't know the        | error-prone for     |
    |            | keyboard?             |                     |
    +------------+-----------------------+---------------------+
    | Explicit   | Less ambiguous, both  | One minute required |
    | delimiters | to human readers and  | to explain the rule |
    |            | text processing       | to typist; no time  |
    |            | software              | required for reader |
    +------------+-----------------------+---------------------+
    | ASCII Art, | About as well as one  | Very tedious to     |
    | such as    | can do with plain     | create and maintain |
    | this table | (monospaced) text     |                     |
    +------------+-----------------------+---------------------+
    | HTML, Tex, | Tremendous power and  | Tremendous learning |
    | LaTex, or  | sophisticated control | curve and effort    |
    | SGML       | over resulting docs   | to use              |
    +------------+-----------------------+---------------------+

Why did people do such things?  Because there was a perceived
value in enhancing the appearance of the text.  One could
certainly argue (and some typographers do) that the borders
in such tables are extraneous and bad.  However, most people
of my acquaintance are accustomed to such extraneous decoration
of the content and are willing to invest a small effort to
achieve it.

OTOH, if you really want to be consistent with the philosophy
of letting the human type for human consumption and requiring
the formatting program to figure out what was meant, I'd love
to see some code that can handle the following (exactly as it
appears below, of course).  Back in the day, I spent quite a
bit of time trying to come up with a bit of AI that could take
a flat file or printout image such as the following  and infer
where the columns were intended, whether each column was to be
left-justfied, right-justified, or centered, and what type of
data should appear in each.  It's non-trivial IMHO, but YMMV.

Emp# First Name Last Name   Nickname Pager Nr Phone Number
==== ===== ==== ==== ====== ======== ===== == ===== ======
  12 Johannes   Doe         Jake     888-1001     555-1212
3456 Ferdinando Quattlebaum Ferdy             800-555-1214
 234 Betty  Sue Doaks                             555-1213
4567 Sue  Ellen Van Der Lin          888-1002 888-555-1215

Thanks for listening/reading (if you get this far!  ;-)

-jn-

--
You can love your job, but don't expect it to love you back.
                                                    -- Aaron Watters
joel;dot;neely;at;FIX;PUNCTUATION;fedex;d