index_cheshire [-b -T tempdirname -S "sort flags" -L logfilename] configfilename [startrec] [maxrec]
Index_cheshire uses the description included in the configuration file configfilename to index an SGML database.
The configuration file should describe the main SGML data file, the associator file (See buildassoc), the DTD for the SGML data, and the desired index and cluster definitions. The program will create the index files specified in the configuration file and then process all of the SGML records and extract the appropriate information to populate the specified indexes. It will also extract the base information used for cluster files (which must be further processed by index_clusters to create the full cluster indexes.
If the -b flag is used, the indexing uses some batch loading techniques that greatly increase the indexing speed. Use of this flag is recommended in most cases, and especially when doing the initial indexing of a large database. If the -T flag is used then the "tempdirname" should be the path of the directory where the temporary files created by the batch processing and sorting are to be put. The -S flag can be used to pass a quoted list of flags to the system sort used in batch loading (see the documentation for the sort command on your system for the flags available).
The optional startrec and maxrec arguments are the logical record numbers of the first record to start indexing and the last record to index. If these are not supplied, index_cheshire assumes that all records should be indexed. If a single number is supplied it is assumed to be the startrec parameter, and indexing will be done from that record to the end of file. Note that the logical record numbers used are derived from the associator file for the database and usually correspond to the sequence of records in the SGML file. .
New SGML records can simply be concatenated to the end of an SGML file, but they then must be added to the indexes. To do this, after concatenating the new records, run the buildassoc command on the database -- this will report the logical record number for the start of the new records in the database. This number should then be supplied as the startrec parameter to the index_cheshire program.
If the "-L" option is not used, then running the index_cheshire program creates a file in the current directory called INDEX_LOGFILE. If the "-L" option is used then the logfile is given the name provided on the command line. If this file already exists when the program is run, new information is appended to the existing file. INDEX_LOGFILE will contain any errors or problems encountered in indexing.
Configuration file documentation, index_clusters, buildassoc
Ray R. Larson ()