Mailing List Archive: Re: possible data format ... Re: Re: dbms3.r 01

[REBOL] Re: possible data format ... Re: Re: dbms3.r 01

From: joel:neely:fedex at: 15-Jan-2002 9:37


HI, Rod,

Nice summaries!

Rod Gaither wrote:
> This would be nice but I'm not sure it is reasonable for records
> with a large text block.  One of the things I'd like to avoid is
> imposing limits on record size so something like a webpage could
> be a record if desired.
>

I think the underlying issue is which data types (in the generic
sense of the word) should be supported.  Many databases with which
I'm familiar treat "running text" (with large size potential) as a
different type than "character data field" (with some upper limit
on capacity).  To avoid dependence on filesystem issues (assuming
we DO want to stay away from entanglement with any specific OS
features/limits) one strategy is to store string data up to a
certain size within the record; above that size, each such value
would be kept in a separate file, with the name of the file as a
string within the original record.  Certainly not the fastest
option for some purposes, but it does allow such things as searches
on non-running-text fields (e.g. keys) to be done without the
overhead of reading/skipping the big chunks.

The same approach can be used for BLOb data.

> >
> >- new AND UPDATED AND DELETED records are appended to the end of
> >  the file
>
> This is something I have considered as well.  Another option would
> be to have a transaction file or files that either contain whole
> record changes or just "individual operations".  This audit/log
> could then be applied via a utility to update the main data file
> when desired.  Updating the main file would improve the read
> operations and bring the db back into an easy to view form.
>

Perhaps I should have been more explicit.  I assumed the existence
of a utility, also alluded to in Petr's post, which would "pack"
the data back to canonical form (one physical record per logical
record).  A trivial variation on that utility also allows for
viewing the data without actually rewriting the packed file to disk.

The "append all stuff" strategy essentially puts the log (audit trail)
within the file itself.  I believe it's faster to do that with one
file (scanned as needed when un-packed) than with a separate main
and transaction files.

> All of these issues show the trade offs in the different operations
> you need to do on the database.  Read and search performance
> versus update and write operations and so on.  They conflict quite
> a bit with the organization I would pick to keep the data in a single
> file with visually "nice" organization. :-(
>

Simplicity imposes limits.  I don't know how to define it except
with respect to intended uses.

The simplest format I know (from the point of view of writing code
to read the data) is fixed-field layout, where each data element is
right-blank-padded (if it is "string-like") or left-zero-padded (if
it is "number-like") to constant size across all records.

    1013John        Doe         1983221
    3253Hermione    Fibbershins 1984616
    4506ThrockmortonWilberforce 1985703
    5323Johannes    Dingsda     1988445
    7151Tuxedo      Penguin     1995499
    9598Bill        Cat         1997525

The simplest format I know (from the point of view of a human trying
to look at the raw data is a forms-like presentation with explicit
labels for all data values:

    Employee ID: 1013
     First Name: John
      Last Name: Doe
     Year Hired: 1983
         Office: 221

    Employee ID: 3253
     First Name: Hermione
      Last Name: Fibbershins
     Year Hired: 1984
         Office: 616

    Employee ID: 4506
     First Name: Throckmorton
      Last Name: Wilberforce
     Year Hired: 1985
         Office: 703

    Employee ID: 5323
     First Name: Johannes
      Last Name: Dingsda
     Year Hired: 1988
         Office: 445

    Employee ID: 7151
     First Name: Tuxedo
      Last Name: Penguin
     Year Hired: 1995
         Office: 499

    Employee ID: 9598
     First Name: Bill
      Last Name: Cat
     Year Hired: 1997
         Office: 525

I have a strong motivation to make it *possible* for humans (e.g.,
me!) to read my data files since that's often useful for debugging
and troubleshooting.  However most of the access is done by programs,
so I tend to make it "just enough" human readable and prefer to ease
the parsing burden on the program.

-jn-