Cheshire II Commands

cheshire_load - Create or extend a CheshireII DataStore file


SYNOPSIS

cheshire_load [-q][-s] configfile sgmlfile sgmltag
or:
cheshire_load [-q][-s] -r configfile directory sgmltag

DESCRIPTION

Cheshire_load creates or extends the datastore file for an SGML/XML data file used in the Cheshire II system. Datastore files store pre-parsed versions of the SGML/XML documents and the original documents need not be available after cheshire_load has finished its job.

There are two forms of this command (the same as for buildassoc), one for single SGML data files (where the file contains all of the SGML data), and the other for situations where there are multiple SGML files each containing multiple records within a directory subtree. In the second form (indicated by the -r) only the top-level directory is provided as an argument, and the subtree is recursively scanned for files which contain the identifying SGML tag sgmltag. This recursive form of the command generates a special file of "FILECONT" tags for inclusion in the configuration file for the database.

The configfile argument should be the name of the configuration file to be processed. Note that ONLY the first file definition in the configuration file will be created, the configfile can be simple and omit index definitions, etc. if desired (an example of a minimal configfile for creating a datastore file is shown below). The file named in the FILENAME tag of the configfile will be created if it doesn't exist, or used if it does. The parsed SGML/XML data will all be loaded into that file.

The sgmlfile argument should be the name of the (single) SGML file to be processed.

The directory argument, used with the -r option, should be the pathname of the root directory subtree to be processed.

The sgmltag argument should be the top-level tag defining a record in the SGML data file.

If the -q flag is used, information about each record processed is suppressed, otherwise the program indicates each record found and it's length. This will also suppress reports of attempts to load duplicate records (they are not loaded in any case).

If the -s flag is used, extra header information is skipped and not counted as part of the record. This is useful when, for example, XML records that you want to index as documents are "wrapped" with an XML declaration, DOCTYPE and extra begin and end tags at the beginning and end of the file to make the whole file a single valid XML document. In extraction, these leading and end tags are ignored and only the tags asked for are used as begin and end tags of "documents".

A Minimal Configfile for use with cheshire_load

<!-- This is a test config file for Cheshire II -->
<DBCONFIG>
<DBENV> /home/ray/Work/DBENV </DBENV>

<!-- The filedef -->
<FILEDEF TYPE=MARCSGML_DATASTORE>

<DEFAULTPATH> /home/ray/Work/cheshire/index </DEFAULTPATH>

<!-- filetag is the "shorthand" name of the file -->
<FILETAG> bibfile </FILETAG>

<!-- filename is the full path name of the file -->
<FILENAME> TEST_DATASTORE </FILENAME>

<!-- fileDTD is the full path name of the file's DTD -->
<FILEDTD> /home/ray/Work/cheshire/doc/USMARC08.DTD </FILEDTD>

<DISPOPTIONS>
KEEP_ALL
</DISPOPTIONS>

</FILEDEF> 
</DBCONFIG>

OUTPUT FILES

The file named in the FILENAME tag in the configfile is created or extended using this command.

ERROR INFORMATION

Errors are reported to stderr;

BUGS

None known

SEE ALSO

Configuration file documentation, index_cheshire

AUTHOR

Ray R. Larson ()