Cheshire Main Configuration

[ Up to the contents ] [ On to Explain ]

This tag is the top level tag which contains all the definitions for the particular set of indexes. One dbconfig should be used per file, and contain only information about a single database -- as the indexing command processes everything in the configfile.

This is where the database system keeps its cross database files which allow it to handle serving data at the same time as reindexing (among other things). This can and should be the same location for all cheshire databases running on the system. Under the setup explained previously this is (home)/cheshire/default, and thus looks something like:
<dbenv> /home/cheshire/cheshire/default </dbenv>

Filedef is the main definition of a set of indexes, formats and so forth. Clusters, as described below, use a second filedef tag within the one dbconfig file. It has one attribute, 'type'. This can have one of the following values:
SGMLThe data is in SGML format
MARCThe data is in (US)MARC sgml format (and can hence be returned in MARC after transformation)
CLUSTERThe data is a Cluster definition
AUTHThe data is ? Z39.50 Authentication records ?
LCTREEThe data is ? Library of Congress ?
MAPPEDThe SGML is a map of how to access non SGML data
SQL, DBMS or RDBMSExternally managed databases via Z39.50

The most common then is SGML, and the tag generally looks like:
<filedef type=SGML>

This tag contains a default path which will be prepended to all filenames in the document that is not already a full path. It must be the field directly after the filedef tag.

<defaultpath> /home/cheshire/cheshire/quickstep/ </defaultpath>

The filetag tag contains the name of the database that is configured in this filedef. This is what needs to be passed as the database in a z39.50 query, or in the CHESHIRE_DATABASE variable for a local search - but more on the client side later. Suffice to say that this is what distinguishes this database from the others on your system, and hence needs to be unique.
You need to also put this into the /etc/zserver.conf file to have the database available via z39.50 - see the information in the initial setup document.

If we call the database teidocs, because it contains documents in the TEI DTD, the tag would look like:
<filetag> teidocs </filetag>

There are, currently, two reserved names. 'metadata' is used to retreive metadata about the database, and 'IR-Explain-1' is the Z39.50 name for the metadata database about all databases on the system. Further documentation on the metadata database will be forthcoming.

This tag contains the path to the data that you wish to index. It can either have a filename in it, which has all the documents one after the other, or it can contain a directory. If it has a directory, then all the files in it to be processed need to be listed afterwards in 'filecont' tags. See below.

If there is one monolithic file to be indexed, then the tag might look like:
<filename> /home/cheshire/cheshire/teidocs/DATA/alldocuments.sgml </filename>
If there are multiple files, then it would simply remove the '/alldocuments.sgml' from the end.

The filecont tag is only used if the filename tag above contains a directory. Each filecont tag will contain the path to one file within the above directory. Normally, however, the full path is in the tag as it is automatically generated by the 'buildassoc' program. Although it is unnecessary, it is still valid to have only 1 filecont tag, which makes globally usable script writing easier.
Filecont has 3 attributes; ID, which is an id for the file; MIN, which stores the lowest document ID in the file; and MAX which stores the highest document ID in the file.
Note that after any filecont fields comes the explain section if present. This is documented in the next section.

An example:
<filecont id=1 min=1 max=5> /home/cheshire/cheshire/teidocs/DATA/docs1-5.sgml </filecont>

Filedtd contains the path to the DTD for the sgml to be indexed. Cheshire wants the full DOCTYPE tag, and hence if this is not supplied it is necessary to create a wrapper file. This file would have just one line and look something like:
<DOCTYPE EAD SYSTEM "/home/cheshire/cheshire/default/dtds/ead.dtd" []>

The corresponding filedtd tag might then be:
<filedtd> /home/cheshire/cheshire/default/dtds/ead.wrapper </filedtd>

Sgmlcat contains the path to the SGML catalog file, which needs to have the public identifiers for the DTDs that you use and so forth. This should be something like:
<sgmlcat> /home/cheshire/cheshire/default/catalog &;lt;/sgmlcat>

Note well that this catalog should contain an entry pointing to the default sgml declaration. You can find this file in the docs directory in the cheshire source code. The line in the catalog file should look something like:
SGMLDECL "/home/cheshire/cheshire/default/default_sgml_dcl"

This tag contains the associator file for the data. This file is generated by the buildassoc program, and defaults to the filename or the directory name with '.assoc' appended. For ease, it's useful to keep this named after the name of the database. So for the teidocs database this could be:
<assocfil> /home/cheshire/cheshire/teidocs/teidocs.assoc </assocfil>

The history tag contains the path to the history file for the database. This never appears to be used, so don't worry about what it actually is. It should look something like:
<history> /home/cheshire/cheshire/teidocs/teidocs.history </history>