Cheshire II Databases

DataStore Databases

\- Storage of pre-parsed SGML/XML

DESCRIPTION

This document describes the CheshireII DataStore facility, used to store versions of SGML/XML documents in a pre-parsed form. These documents take more storage than plain text SGML/XML, but have the advantage that the parsing step used for display conversions, indexing, component retrieval, etc. is only done once when the document is stored (using the cheshire_load command), and all other actions will used the pre-parsed form of the document.


DataStore Databases

The most obvious difference between normal CheshireII databases and a DataStore database is that DataStore databases are not "human-readable" SGML/XML files with associator files. Instead a DataStore database uses Berkeley DB to store the SGML/XML in a pre-parsed form that can be rapidly retrieved and reconstituted in its fully parsed internal form. In a configuration file the FILEDEF TYPE attribute is used to indicate that the file name in the FILENAME tag is a DataStore file. The following may be used to do this...

<FILEDEF TYPE=SGML_DATASTORE> or <FILEDEF TYPE=XML_DATASTORE>

<FILEDEF TYPE=MARC_DATASTORE> or <FILEDEF TYPE=MARCSGML_DATASTORE>

(The last pair is effectively equivalent for databases using the MARC conversion and DTDs included in the distribution), then, instead of using the buildassoc utility, the cheshire_load utility is used to load the pre-parsed version of the data into the Berkeley DB database, (cheshire_load has all of the same parameters as buildassoc, so the same type of scanning of directories, etc. can be done).

Except for the utilities that depend on an associator file (these have been replaced for DataStore files by a new utility called read_datastore.

All other utilities (index_cheshire, etc.) recognize from the configuration file that a DataStore database is being used and behave the same as for normal databases.


BUGS

None known.

SEE ALSO cheshire_load, read_datastore

AUTHOR

Ray R. Larson ()