Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search

[REBOL] Re: possible data format ... Re: Re: dbms3.r 01

From: joel:neely:fedex at: 15-Jan-2002 9:37

HI, Rod, Nice summaries! Rod Gaither wrote:
> This would be nice but I'm not sure it is reasonable for records > with a large text block. One of the things I'd like to avoid is > imposing limits on record size so something like a webpage could > be a record if desired. >
I think the underlying issue is which data types (in the generic sense of the word) should be supported. Many databases with which I'm familiar treat "running text" (with large size potential) as a different type than "character data field" (with some upper limit on capacity). To avoid dependence on filesystem issues (assuming we DO want to stay away from entanglement with any specific OS features/limits) one strategy is to store string data up to a certain size within the record; above that size, each such value would be kept in a separate file, with the name of the file as a string within the original record. Certainly not the fastest option for some purposes, but it does allow such things as searches on non-running-text fields (e.g. keys) to be done without the overhead of reading/skipping the big chunks. The same approach can be used for BLOb data.
> > > >- new AND UPDATED AND DELETED records are appended to the end of > > the file > > This is something I have considered as well. Another option would > be to have a transaction file or files that either contain whole > record changes or just "individual operations". This audit/log > could then be applied via a utility to update the main data file > when desired. Updating the main file would improve the read > operations and bring the db back into an easy to view form. >
Perhaps I should have been more explicit. I assumed the existence of a utility, also alluded to in Petr's post, which would "pack" the data back to canonical form (one physical record per logical record). A trivial variation on that utility also allows for viewing the data without actually rewriting the packed file to disk. The "append all stuff" strategy essentially puts the log (audit trail) within the file itself. I believe it's faster to do that with one file (scanned as needed when un-packed) than with a separate main and transaction files.
> All of these issues show the trade offs in the different operations > you need to do on the database. Read and search performance > versus update and write operations and so on. They conflict quite > a bit with the organization I would pick to keep the data in a single > file with visually "nice" organization. :-( >
Simplicity imposes limits. I don't know how to define it except with respect to intended uses. The simplest format I know (from the point of view of writing code to read the data) is fixed-field layout, where each data element is right-blank-padded (if it is "string-like") or left-zero-padded (if it is "number-like") to constant size across all records. 1013John Doe 1983221 3253Hermione Fibbershins 1984616 4506ThrockmortonWilberforce 1985703 5323Johannes Dingsda 1988445 7151Tuxedo Penguin 1995499 9598Bill Cat 1997525 The simplest format I know (from the point of view of a human trying to look at the raw data is a forms-like presentation with explicit labels for all data values: Employee ID: 1013 First Name: John Last Name: Doe Year Hired: 1983 Office: 221 Employee ID: 3253 First Name: Hermione Last Name: Fibbershins Year Hired: 1984 Office: 616 Employee ID: 4506 First Name: Throckmorton Last Name: Wilberforce Year Hired: 1985 Office: 703 Employee ID: 5323 First Name: Johannes Last Name: Dingsda Year Hired: 1988 Office: 445 Employee ID: 7151 First Name: Tuxedo Last Name: Penguin Year Hired: 1995 Office: 499 Employee ID: 9598 First Name: Bill Last Name: Cat Year Hired: 1997 Office: 525 I have a strong motivation to make it *possible* for humans (e.g., me!) to read my data files since that's often useful for debugging and troubleshooting. However most of the access is done by programs, so I tend to make it "just enough" human readable and prefer to ease the parsing burden on the program. -jn-