[REBOL] Re: possible data format ... Re: Re: dbms3.r 01
From: joel:neely:fedex at: 15-Jan-2002 9:37
HI, Rod,
Nice summaries!
Rod Gaither wrote:
> This would be nice but I'm not sure it is reasonable for records
> with a large text block. One of the things I'd like to avoid is
> imposing limits on record size so something like a webpage could
> be a record if desired.
>
I think the underlying issue is which data types (in the generic
sense of the word) should be supported. Many databases with which
I'm familiar treat "running text" (with large size potential) as a
different type than "character data field" (with some upper limit
on capacity). To avoid dependence on filesystem issues (assuming
we DO want to stay away from entanglement with any specific OS
features/limits) one strategy is to store string data up to a
certain size within the record; above that size, each such value
would be kept in a separate file, with the name of the file as a
string within the original record. Certainly not the fastest
option for some purposes, but it does allow such things as searches
on non-running-text fields (e.g. keys) to be done without the
overhead of reading/skipping the big chunks.
The same approach can be used for BLOb data.
> >
> >- new AND UPDATED AND DELETED records are appended to the end of
> > the file
>
> This is something I have considered as well. Another option would
> be to have a transaction file or files that either contain whole
> record changes or just "individual operations". This audit/log
> could then be applied via a utility to update the main data file
> when desired. Updating the main file would improve the read
> operations and bring the db back into an easy to view form.
>
Perhaps I should have been more explicit. I assumed the existence
of a utility, also alluded to in Petr's post, which would "pack"
the data back to canonical form (one physical record per logical
record). A trivial variation on that utility also allows for
viewing the data without actually rewriting the packed file to disk.
The "append all stuff" strategy essentially puts the log (audit trail)
within the file itself. I believe it's faster to do that with one
file (scanned as needed when un-packed) than with a separate main
and transaction files.
> All of these issues show the trade offs in the different operations
> you need to do on the database. Read and search performance
> versus update and write operations and so on. They conflict quite
> a bit with the organization I would pick to keep the data in a single
> file with visually "nice" organization. :-(
>
Simplicity imposes limits. I don't know how to define it except
with respect to intended uses.
The simplest format I know (from the point of view of writing code
to read the data) is fixed-field layout, where each data element is
right-blank-padded (if it is "string-like") or left-zero-padded (if
it is "number-like") to constant size across all records.
1013John Doe 1983221
3253Hermione Fibbershins 1984616
4506ThrockmortonWilberforce 1985703
5323Johannes Dingsda 1988445
7151Tuxedo Penguin 1995499
9598Bill Cat 1997525
The simplest format I know (from the point of view of a human trying
to look at the raw data is a forms-like presentation with explicit
labels for all data values:
Employee ID: 1013
First Name: John
Last Name: Doe
Year Hired: 1983
Office: 221
Employee ID: 3253
First Name: Hermione
Last Name: Fibbershins
Year Hired: 1984
Office: 616
Employee ID: 4506
First Name: Throckmorton
Last Name: Wilberforce
Year Hired: 1985
Office: 703
Employee ID: 5323
First Name: Johannes
Last Name: Dingsda
Year Hired: 1988
Office: 445
Employee ID: 7151
First Name: Tuxedo
Last Name: Penguin
Year Hired: 1995
Office: 499
Employee ID: 9598
First Name: Bill
Last Name: Cat
Year Hired: 1997
Office: 525
I have a strong motivation to make it *possible* for humans (e.g.,
me!) to read my data files since that's often useful for debugging
and troubleshooting. However most of the access is done by programs,
so I tend to make it "just enough" human readable and prefer to ease
the parsing burden on the program.
-jn-