Cheshire II Commands
This document describes the format used in the Cheshire II configuration files and their contents.
Configuration files provide the specifications for indexing of SGML elements and sub-elements by the Cheshire II server and search engine. There may be multiple configuration files for each "file" maintained by the server, or the configuration specifications for multiple files may be included in a single configuration file.
Configuration files are themselves SGML documents (The DTD is appended). The following table describes each element (tag) for the configuration file and it's contents. There are matching end tags (beginning with "</" for each of the tags described below. Everything between a begin tag and end tag is processed as the value of the tag, all "white space" (blanks, tabs, carriage returns, newlines) and SGML comments (anything beginning with "<!--" and ending with "-->") are ignored in processing. The exception to this is blank spaces in QUOTED file names under Windows to permit multiword file and directory names
| TAG | Data Type | Meaning |
|---|---|---|
| <dbconfig> | tag | This tag begins a configuration file. It is required. |
| <dbenv> | directory name | This tag is optional and may be used to set the (required) database environment directory for indexes defined in the config file. Alternatively, if the CHESHIRE_DB_HOME environment variable is set, then it overrides values specified here. As of version 2.20, one or the other must be set. Unlike the environment variable, the DBENV option lets you have a separate environment for all indexed. For further information in database environments, see the BerkeleyDB documentation included in the distribution. |
| <filedef TYPE= ?> | tag | This tag introduces a new file definition. All tags until the matching end tag are part of this file definition. The tag includes the following attributes: |
| ATTR: | TYPE = filetype | This attribute specifies the type of file. The current recognized types are SGML, XML, MARC, MARCSGML, AUTH, CLUSTER, LCTREE, MAPPED (these are actually ALL SGML files, but filetypes other than the default (SGML) have special processing routines in the server) and SQL | DBMS | RDBMS (which are external relational database files that are accessible via Z39.50, but are not managed or indexed directly by Cheshire) and finally SGML_DATASTORE, XML_DATASTORE, MARC_DATASTORE and MARCSGML_DATASTORE (which use the Cheshire DataStore facility to store pre-parsed versions of the SGML, XML, or MARC_SGML data). VIRTUAL may be used for virtual databases. Note that MAPPED files assume that the SGML structure is just a definition of how to access the data from non-SGML documents stored in one or more files. |
| <ConfigInclude> | filepath | This tag can be used to reference the contents of other files containing configuration file information. It can be used in place of a filedef so that the named file is read and the filedef in that file is included. It is possible to use this for other parts of a configfile, but not in all situations. Note that the configfile using ConfigInclude must AT LEAST include a DBCONFIG tag and DBENV tag in addition to the ConfigInclude tags. DBCONFIG and DBENV tags are ignored in files included with ConfigInclude, so it is possible for multiple standalone configfiles to be combined, for example, for a virtual database definition where the individual databases are added to the virtual DB configfile by ConfigInclude. |
| <filename> | filepath | This tag precedes the full path name of the main file for this file when the database is a single file of records. It should contain only the directory name containing the files (not really used) if multiple continuation files are being used. It should contain the name of the external database when the filetype is DBMS. For DataStore files this is the actual Berkeley DB database file in which the pre-parsed SGML/XML data is stored. |
| <filetag> | string | This tag precedes a nickname to use for the file, this can be used in place of the full filename whenever the file name needs to be specified. (MUST BE UNIQUE across all files, especially since the server can support multiple config files for different databases) |
| <filecont ID=? MIN=? MAX=?> | filepath | This tag precedes the full path name of a continuation file for very large files that, for example, must be split over different devices, or are stored as multiple separate files in a directory subtree (see the "-r" option of the buildassoc utility that can automatically generate all filecont entries for a given directory subtree). |
| ATTR: | ID=number | The id attribute of filecont gives the sequence number of this particular continuation file (there may be multiple continuation files). |
| ATTR: | MIN=number | This provides the smallest document ID number contained in the continuation file. |
| ATTR | MAX=number | This provides the highest document ID number contained in the continuation file. |
| <continclude> | filepath | This tag precedes the full path name of a file that contains ONLY filecont definitions as described above. The file is read and processed as if it were included in the configuration file directly. this may be interspersed with filecont tags. |
| <filedtd TYPE= ?> | filepath | This tag precedes the full path name of the SGML/XML Document Type Definition (DTD) file for this file. OR, if the TYPE=XMLSCHEMA attribute is set, the name is the name of the XMLSchema DTD file. |
| ATTR: | TYPE = dtdtype | This attribute specifies the type of DTD or Schema. The current recognized types are SGML, XML, and XMLSchema. This attribute is OPTIONAL and will default to SGML if not supplied. |
| <XMLSchema> | filepath | This tag precedes the full path name of the XMLSchema Definition for this file. If the FILEDTD TYPE=XMLSCHEMA attribute is set, then it is an error not to include this tag. |
| TAG | Data Type | Meaning |
|---|---|---|
| <SGMLcat> | filepath | This tag precedes the full path name of the SGML catalog file used to resolve PUBLIC name references (and potentially other references) for this DTD. This provides the basis for an "Entity Manager" that generates a system identifier for every external entity using catalog entry files in the format defined by SGML Open Technical Resolution TR9401:1995 (http://www.sgmlopen.org/sgml/docs/library/9401.htm). See below for catalog entry formats. |
| <Explain> | explain defs | This tag precedes a set of Z39.50 explain definitions for the database (file), This information is used along with the rest of the the configuration information to generate explain records for the file. See the Explain definitions table below. |
| <assocfil> | filepath | This tag precedes the full path name of the associator file for this file. This should be empty for DBMS filetypes. |
| <history> | filepath | This tag precedes the full path name of the history file for this file. |
| <indexes> | comments | This tag precedes the definitions of indexes for this file. Each index definition (described in the following table) is specified following this tag and preceding the matching </indexes> tag. |
| <components> | comments | This tag precedes the definitions of components for this file. Each component definition (ComponentDef: described in a following table) is specified following this tag and preceding the matching </componentss> tag. |
| <clusters> | filename | If the filedef type attribute is NOT CLUSTER, then this tag is used to introduce the definitions and names of cluster files based on elements extracted from this file and to indicate the cluster key element(s) for the clustering process. Each cluster name should have it's own filedef definition with the type attribute set to CLUSTER. One cluster file name should be specified per occurrance of the tag. When the filedef type attribute IS CLUSTER, then this will contain the name of the file being clustered. (not used for DBMS filetypes.) See the following table on cluster specifications for the sub-elements used in the cluster tag. |
| <dispoptions> | keywords | This tag precedes one or more from a set of keywords relating to special options for conversion by the cheshire server when records are returned. The option keywords are: KEEP_AMP to retain any "&"(&) entities, KEEP_LT to retain any "<"(<) entities, KEEP_GT to retain any ">"(>) entities, or KEEP_ALL to retain all three. The default operation of the server is to convert these to their single character form. |
| <displays> | displayspec | This tag precedes the definitions of display formats (or "element sets" in Z39.50 jargon) for this file/database. Each "Display Specification" definition (described in a subsequent table) is specified following this tag and preceding the matching </displays> tag. This is an optional tag. If not supplied, the entire record is always returned regardless of the "element set" requested in a present or search. The old form of this tag (<display>) will still work. |
| TAG | Data Type | Meaning |
|---|---|---|
| <indexdef ATTR...> | comments | This tag signals the beginning of one particular index definition. The attributes that should be included in each indexdef are: |
| ATTR: | ACCESS= access type | This is the access method to be used in this index. The methods are BTREE, HASH, VECTOR or BITMAPPED. The default is "BTREE". For DBMS (SQL or RDBMS) filetypes "DBMS" is used. See Table below for access types. |
| ATTR | EXTRACT=key type | This attribute specifies the sort of extraction to be performed on the data. Currently "KEYWORD", "EXACTKEY", "FLD008_KEY", "FLD008_DATE", "FLD008_DATERANGE", "URL", "FILENAME", "DATE", "DATE_TIME", "DATE_RANGE", "DATE_TIME_RANGE", "GEOTEXT$quot;, "LAT_LONG$quot;, "BOUNDING_BOX", "GEOTEXT_LAT_LONG$quot;, "GEOTEXT_BOUNDING_BOX", "KEYWORD_EXTERNAL" "KEYWORD_PROXIMITY" or "KEYWORD_EXTERNAL_PROXIMITY" are supported. In DBMS files KEYWORD and EXACTKEY are valid and used for text or char fields. INTEGER_KEY (or INTEGER), DECIMAL_KEY (or DECIMAL), and FLOAT_KEY (or FLOAT) are also available for SINGLE numerical data elements and also for DBMS files mapping. "KEYWORD" is the default. See table below for all EXTRACT attribute Codes and Their Meanings along with aliases for the codes. |
| ATTR: | NORMAL=normalization type | This attribute specifies the sort of normalization to be performed on the keys extracted. The options are "STEM", "WORDNET", "CLASSCLUS", "BASIC", "DO_NOT_NORMALIZE", "REMOVE_TAGS_ONLY", "STEM_NOMAP", "WORDNET_NOMAP", "XKEY" or "EXACTKEY", "XKEY_NOMAP" or "EXACTKEY_NOMAP", "CLASSCLUS_NOMAP", "BASIC_NOMAP", "STEM_FREQ" "XKEY_FREQ", and "BASIC_FREQ", "REMOVE_TAGS_ONLY_FREQ". Recent additions include a simple plural-removing stemmer "SSTEM" or "SSTEM_FREQ" and the Snowball stemmer for European languages including: FRENCH_STEM, GERMAN_STEM, DUTCH_STEM, SPANISH_STEM, ITALIAN_STEM, SWEDISH_STEM, PORTUGUESE_STEM, RUSSIAN_STEM (or RUSSIAN_UTF8_STEM), RUSSIAN_KOI8_STEM, DANISH_STEM and NORWEGIAN_STEM If the EXTRACT attribute is one of the date or date time values, NORMAL must be the pattern to use in extracting the date from the records. See the table Date Format Patterns and Their Interpretations for DATE_TIME format patterns. "BASIC" is the default. If the EXTRACT attribute is one of the LAT_LONG or BOUNDING_BOX values, NORMAL must be the pattern to use in extracting the coordinate information from the records. See the table below: LAT_LONG and BOUNDING_BOX Format Patterns and Their Interpretations for LAT_LONG and BOUNDING_BOX format patterns. "BASIC" is the default. See table of normalization types below. |
| ATTR: | PRIMARYKEY=Primary key options | "PRIMARYKEY=IGNORE" or simply "PRIMARYKEY" in this attribute indicates that the index is the primary key for the data file, and any duplicate keys are to be ignored (REJECT is a synonym for IGNORE). "PRIMARYKEY=REPLACE" indicates that incoming records with duplicate primary keys are to replace any older existing record with that same primary key. "PRIMARYKEY=NO", "PRIMARYKEY=NONE" or "NOTPRIMARYKEY", or nothing is the default value and indicates a normal non-primary key index. Note that there should only be one primary key defined for any file, if more are defined only the last one defined will be used as the primary key. |
| <indxtag> | string | This tag precedes a name for this index (such as "author" "title", etc.) For DBMS indexes this should be the DBMS COLUMN name from the table or view. |
| <indxname> | filepath | This tag precedes the full path name for the actual index file created by DBOPEN. Note that this file now contains all index and postings information. For DBMS indexes this should be the DBMS table or View name. |
| Continued on next page | ||
| TAG | Data Type | Meaning |
|---|---|---|
| <indxmap ATTR...> | map entries | indxmap entries are used to indicate which Z39.50 attributes should be mapped to this index. Any attributes NOT specified imply that ANY value for that element should be mapped to this index. Note that these are examined sequentially, so the order in the configfile matters. The sub-elements of indxmap are described in a later table. |
| ATTR: | ATTRIBUTESET=Z39.50 Attribute Set | This is the attribute set to be used in matching z39.50 queries to
this index. (Symolic names for the following Attribute sets supported are "BIB-1", "EXP-1",
"EXT-1", "CCL-1", "GILS", "STAS", "COLLECTIONS-1", "CIMI-1", "GEO-1", "ZBIG", "UTIL" ;, "XD-1", "ZTHES", "FIN-1", "DAN-1" and "HOLDINGS"). Either these symbolic names (and many variants) or the OIDs may be specified (OIDs of unlisted attribute sets can also be specified). Default is BIB-1. |
| <indxcont ID= ???> | filepath | This tag precedes the full path name of a continuation index file for very large index files. (NOT IMPLEMENTED IN THIS VERSION) |
| ATTR: | ID = number | The id attribute of indxcont gives the sequence number of this particular index continuation file. |
| <stoplist> | filepath | This tag precedes the full path name for the stopword list to use in indexing this set of fields. |
| <RANK_PARAMS TYPE= ???> | PARAMS list | This tag encloses a set of parameters to be passed to the appropriate ranking method specified in the TYPE attribute. |
| ATTR: | TYPE = ranking type name | The TYPE attribute of RANK_PARAMS gives the type of ranking to which the subsequent <PARAM> tags applies the possible values are: "Logistic_Regression" (or "LR" or "LOGREG"), "OKAPI" (or "OK" or "BM25") or "Language_Modeling" (or "LANGMOD" or "LM") (Note however that this last is not available yet - version 40). | <PARAM ID=???> | parameter value | This tag represents the value of a parameter to be passed to the appropriate ranking method specified in the TYPE attribute of the enclosing RANK_PARAMS tag. |
| ATTR: | ID=integer | The ID attribute of PARAM identifies the particular parameter
that the contents of the PARAM tag represents. For Logistic Regression
there are 7 possible values (with ID's numbered 0-6), and for Okapi
there are 3 possible values (with ID's numbered 1-3).
For Logistic Regression the parameters are:
ID="0": the LR intercept coefficient ID="1": the coefficient for average query term frequency (over matching terms) ID="2": the coefficient for query length ID="3": the coefficient for average document term frequency (over matching terms) ID="4": the coefficient for document length ID="5": the coefficient for average term Inverse Document Frequency, (over matching terms) ID="6": the coefficient for the number of matching terms For Okapi ranking the parameters are: ID="1": the k1 constant ID="2": the b or beta constant ID="3": the k3 constant For further information on the meanings of these parameters see the published papers on Cheshire as used for information retrieval evaluations such as INEX and TREC, or the papers of Stephen Robertson on the Okapi algorithm. |
| <extern_app> | External indexing application | This tag has (currently) two different uses. First, it can introduce the command name and arguments for external indexing of URLs. The string "%~URL~%" should be used in the place where the URL in the data should be substituted in order to fetch the external data so that it can be indexed. A temporary copy must be made of each item during indexing but NO filename should be specified for the application (i.e., the output should be to stdout). The recommended application to use here is curl (http://curl.haxx.se) which is available on Linux, some other Unixes and Mac OS X. Using curl the tag might look like: "<extern_app> curl --silent %~URL~% </extern_app>" The Second use of this tag is to specify the cheshire database to be used as a gazetteer to look up names of places for GEOTEXT extraction methods. In this case the contents of the tag (with no spaces or returns in it) should look like: "<extern_app>GAZETTEER:/path/to/config/file:database_name:index_to_search_names:exact_tagname_for_element_to_index</extern_app>" The colons must be used to separate elements. The "/path/to/config/file" should be just that - the full path name of the config file. "database_name" should be the name of the database in the config file. "index_to_search_names" should be the index name (indxtag) for the index to use in matching. "exact_tagname_for_element_to_index" is the name of the tag (exactly as it appears in the gazetteer records) in the gazetteer data file that should be extracted and used for the index entries in the new index. |
| <indxexc> | tag specifications | This tag introduces the list of record elements that are to be excluded from indexing for this index file. Any elements included here (and all of their sub-elements) will be ignored during the indexing process. See table below for indxkey (i.e., tagspec) specifications. |
| <indxkey> | tag specifications | This tag introduces the list of record elements and sub-elements to be indexed in this index file. See table below for indxkey specifications. |
| </indxkey> | tag | End of the code list for this index. |
| TAG | Data Type | Meaning |
|---|---|---|
| <ComponentDef> | Definition | This element introduces a new component definition. |
| <ComponentName> | name | This element is the name for the component. This should be a file name (full file path) where the component-level information for each document of the current filedef is to be kept. |
| <ComponentNorm> | normalization | This optional element specifies normalization to be applied to the component during indexing. Options currently are "NONE", "COMPRESS", and "NOCOMPRESS". NONE is the same as omitting the element entirely. COMPRESS means to flatten out all markup within the bounds of the component. NOCOMPRESS means to retain the markup (currently NOCOMPRESS and NONE function the same). |
| <ComponentStartTag> | tag definition | This element introduces an TAGSPEC (see below) that defines the beginning of the component (or the beginning with the end assumed to be the matching end tag, if no ComponentEndTag is supplied). Note that if multiple FTAGs are specified within the TAGSPEC, only the FIRST will be used |
| <ComponentEndTag> | tag definition | This element introduces an TAGSPEC (see below) that defines the end of the component. The system assumes that if both a ComponentStartTag and an ComponentEndTag are supplied then the component spans the data from the start tag to the first occurrence of the end tag found in the following data. The span is from the ComponentStartTag beginning UP TO the ComponentEndTag start. Note that if multiple FTAGs are specified within the TAGSPEC, only the FIRST will be used |
| <ComponentIndexes> | index definition | This element introduces a set of indexdef's (see above) that define the access to the component component. Indexdefs are identical to those for the file as a whole, but apply only to the components defined by this ComponentDef and not to the whole document. |
| TAG | Sub-tags/ATTR | Data Type | Meaning |
|---|---|---|---|
| <clusterdef> | see below | cluster definition | This tag is used to introduce a cluster definition. In earlier config files the <CLUSTER> tag was used for this purpose. The preferred form is now to use <CLUSTERS><CLUSTERDEF> ... </CLUSTERDEF> <CLUSTERDEF> ... </CLUSTERDEF></CLUSTERS>, where the ... is composed of the tags below. For compatibility the old form (<CLUSTER>) will still work. |
| <clusbase> | none | filename or filetag | This tag is used when the filedef type attribute is CLUSTER. It indicates which file is clustered by this file. |
| <clustag> | none | filename or filetag | This tag is used when the filedef type attribute is NOT CLUSTER. It indicates which file contains the clusters specified by the following cluster definition information. |
| <clusname> | none | filename or filetag | This tag is used when the filedef type attribute is NOT CLUSTER. It indicates which file contains the clusters specified by the following cluster definition information. (clusname and clustag are aliases for each other). |
| <cluskey | NORMAL=???> | tagspec | This tag introduces the list of record element(s) that are to be used to cluster this file. These are specified like the indxkey tag specifications. See table below for indxkey specifications. |
| ATTR: | NORMAL=??? | normalization type | This attribute specifies the sort of normalization to be performed on the keys extracted. The options are "STEM", "WORDNET", "CLASSCLUS", "BASIC" "STEM_NOMAP", "WORDNET_NOMAP", "XKEY" or "EXACTKEY", "XKEY_NOMAP" or "EXACTKEY_NOMAP", "CLASSCLUS_NOMAP", and "BASIC_NOMAP", "STEM_FREQ" "XKEY_FREQ", and "BASIC_FREQ". "BASIC" (formerly NONE was used for this option, and is still a synonym) is the default. See table of normalization types below. |
| <stoplist> | none | filepath | This tag precedes the full path name for the stopword list to use in extracting cluster keys for the cluster file. |
| <clusmap> | clusmap entries | clusmap entries are used to indicate which elements in the base file are to be inserted in the cluster file. Each clusmap entry should contain two subtags, <from> and <to>, and an optional third tag <summarize>. Note that the same clusmap structure is used to define conversions of display formats as well (see Sub-Tags for Display Specification). | |
| SUBTAG | <from> | tagspec | This should contain a tagspec, (i.e., it can contain multiple tag definitions from the base file). |
| SUBTAG | <to> | tagspec | This should be a simple tagspec name that corresponds to a tag in the cluster file. Patterns in the <to> tagspec are not allowed. |
| SUBTAG | <summarize> | tagspec | This is used to specify that summary information be created for this clusmap entry and stored as in specificed tagspec. <summarize> should contain the following tags: |
| SUBTAG | SUBTAG | <maxnum> | This tag, which gives the maximum number of summary entries to generate, is followed by a simple tagspec |
| SUBTAG | SUBTAG | <tagspec> | (see below) This includes a name that corresponds to a tag in the cluster file. Patterns in the <summarize> tagspec are also not allowed. When a <summarize> tag is given, each unique data item from the <from> tag is collected and the occurances counted. The <maxnum> most frequently occurring data items are output as the contents of the <tagspec> specified. |
| TAG | Sub-tags/ATTR | Data Type | Meaning |
|---|---|---|---|
| <displaydef ATTR...> | format spec | The displaydef tag introduces a named display format definition. This is the treated the same as the format tag below (they are aliases for each other) | |
| <format ATTR...> | format spec | The format tag introduces a named format definition. This is the treated the same as the displaydef tag above (they are aliases for each other) | |
| ATTR: | NAME=??? | format name | This attribute specifies the name of the format that will be matched
against the specified elementset name in a Z39.50 present or search.
There are some special names that can be used for special purposes.
The special format name "XML_ELEMENT_???" (where an XML or
SGML element name is substituted for "???") is used when the document
is an XML or SGML document and you only want to extract the named element.
The this case the convert operation is used with the function name "XML_ELEMENT" and the FROM specification has the special FTAG name "SUBST_ELEMENT".
When this combination is found in the DISPLAY specification (along with
the XML or SGML OID) then the named element is extracted and returned
as the result of the display operation (wrapped in a <RESULT_DATA DOCID="1"> tag with the appropriate source docid. Note that only a single
element will be substituted, though other special elements
(such as "#RANK#", etc.) can be included in the results).
In addition to the above, the ELEMENTSET syntax STRING_SEGMENT_s_e (where s and e are numbers giving the starting character position and the ending character position of in a document) can be used in scripts to extract strings from the record without full parsing. (The form STRING_SEGMENT_e can also be used to get everything from the beginning of the record up to character position "e"). This elementset name does NOT need a displaydef/format to operate, but is included here because of its similar operation to XML_ELEMENT_ elementsets. See the client documentation for more details. The special format name "PAGED_DEFAULT" is used when the document has been indexed and retrieved using a KEYWORD_EXTERNAL index that uses the "PAGED_DIRECTORY_REF" attribute for extraction (see index definition information below). In this case "psuedo-records" will be generated representing each matching page of the page collection. They will include any tags not "excluded" from the base document. |
| ATTR: | OID=??? | recsyntax OID | This attribute specifies the OID of the record syntax that will be matched against the specified record syntax oid in a Z39.50 present or search. This is typically used for "convert" specifications when converting from XML/SGML to some other record syntax like GRS-1. |
| ATTR: | MARC_DTD=??? | filename | This attribute specifies the filename of the USMARC DTD that will be used in the MARC conversion for this format. The DTD should be one of the the USMARC dtds supplied in the doc/install directory of the cheshire distribution. MARC conversion for a NON-MARC file will cause an error if this filename is not supplied. DEFAULTPATH specifications are prepended to the DTD name if it is not a full pathname. |
| ATTR: | DEFAULT | optional | If the DEFAULT keyword attribute is used, then the associated format
is used whenever no explicit format is specified in a present or search.
Format specifications are used to indicate which elements in the base file are to be used in sending a record to a client via Z39.50. Each format entry can contain ONE of three subtags <include>, <exclude>, and <convert> (currently these cannot be combined in the same format). These are described below. |
| <include> | tagspec | This should contain a tagspec, (i.e., it can contain multiple tag definitions from the base file). It indicates which tags to INCLUDE in the records when this format is requested. This tag is optional and should be used when a particular format uses only some small set of tags from the base record. (NOT CURRENTLY SUPPORTED -- USE EXCLUDE) | |
| <exclude ATTR...> | tagspec | This should contain a tagspec, (i.e., it can contain multiple tag definitions from the base file). It indicates which tags to EXCLUDE in the records when this format is requested. This tag is optional also and should be used when a particular format uses everything except some small set of tags. | |
| ATTR: | COMPRESS=??? | exclusion compression | This attribute can be used to indicate whether SGML elements that may be listed in the exclude specification and which are required by the DTD for the document are to be reduced to minimal forms (with all data replaced by elipsis (...) and only required tags retained. The options are "COMPRESS=YES" or "COMPRESS=TRUE" simply "COMPRESS", or "COMPRESS=NO" or "NOCOMPRESS" or "COMPRESS=FALSE" |
| <convert ATTR...> | clusmap | This is used to specify a conversion function to be run for each record, or for specific tags in a record. The tags to be converted are specified exactly like the cluster mapping (clusmap) described above. There should be a "FROM" and "TO" tagspec for each record element to be converted. The SUMMARIZE clusmap tag may be used to specify a string that should be substituted for matching tags. The string to be substituted is put in the TAGSPEC part of the SUMMARIZE tag. The MAXNUM tag of the SUMMARIZE part is used to indicate the maximum number of conversions of matching tags to be performed (0 is all of them, and any number greater than 0 is the maximum number of conversions). Some special tokens can be used in the FROM part of a tagspec to indicate that metadata or processing information (such as internal document ID number, ranking value in retrieval, etc.) be included in the specified TO tag. See the following table for a description of these tokens. | |
| ATTR: | FUNCTION=??? | function name or path | This attribute is the name of the function to apply to either the specified
tags, or to the entire record if no tags are specified. The "PAGE_PATH"
function indicates that the complete page path name should be constructed
from the indicated tag (which should contain only the directory path) and
returned as the tag "<PAGE_PATH>".
The "RECMAP" function maps to the requested record syntax specified
in the OID attribute of the display tag. When the OID for the format
requested is GRS-1, the keywords
"TAGSET-M" and "TAGSET-G" can be used to map to the specified tags
from these tagsets (by tag name or number). To use TAGSET-G numeric tags
in conjunction with the SGML tag subelements of the mapped tags (I.e.,
to treat all subelements of the TAGSET-G mapped tags like RECMAP conversion)
the keyword "MIXED" can be used for the function.
"XML-ELEMENT" can be used to extract a single specified tag, or pattern that can match multiple tags, (in FROM). The TO element is used to substitute a new name for the tag specified in the FROM name tag name. If the conversion record syntax OID is XML or SGML, the element and its subelements are returned. If the conversion record syntax OID is for GRS-1, the specified element is converted to GRS-1 syntax like a "MIXED" record. "MARC" can be used to do XML/SGML to MARC conversions -- it is up to configfile creator to provide the appropriate mappings from XML/SGML elements to MARC fields and subfields. This requires that the "FROM" and "TO" definitions be structured correctly for MARC conversions. The "FROM" tagspecs should specify one or more tags/elements in the order that they are to appear in the subfields specified in the "TO" specifications. For example, in a MARC conversion from an EAD format finding aid the following might be specified...
<from>
<tagspec>
<ftag>archdesc</ftag><s>repository</s><s>address</s>
<ftag>archdesc</ftag><s>repository</s><s>corpname|name</s>
<ftag>archdesc</ftag><s>unitdate</s>
</tagspec></from>
<to>
<tagspec>
<ftag>260</ftag>
<ftag>a</ftag>
<ftag>b</ftag>
<ftag>c</ftag>
</tagspec></to>
Each FROM/TO pair in the specification should correspond to a single MARC field. In the TO specification, the first FTAG should be the three digit MARC field number/name. The following FTAGS (if any) in the TO specification should be the subfields of the MARC field in the order they are to appear. The FROM fields specified in each FTAG line should have a matching TO ftag line (following the first field name line). Data from fields found will be put into the matching subfields of the field. However, if there are multiple occurrences of particular field or subfield values for a given record, sometimes the mapping will be inaccurate (there is currently no way to automatically detect which of the subfields is paired with each of the other subfields). If the function is a full pathname (e.g. starting with a / in unix or Linux or a drive letter, colon, backslash in NT/Windows), it is taken to be an EXTERNAL conversion program. Such an external program should be set up to read a single record via STDIN and to put out the converted form via STDOUT. |
| ATTR: | ALL? | optional | If the ALL attribute is used, it indicates that the function is to be applied to the entire record, and no tags are specified. When doing RECMAP conversion the output tags are the same as the SGML record tags (The ALL option works in GRS-1 record syntax conversions ONLY). |
| TOKEN | Meaning |
|---|---|
| NOTE: | The tokens in this table can be used in the |
| #DOCID# | Insert the internal document id number for the record. |
| #COMPONENTID# | Insert the internal component id number for the retrieved component. |
| #COMPID# | Insert the internal component id number for the retrieved component. |
| #RANK# | Insert the retrieval rank (i.e. the sequence in the ranking) for the record or component. |
| #SCORE# | Insert the retrieval score (normalized to a range from 1000 to 0) for the record or component. |
| #RELEVANCE# | Insert the retrieval score (normalized to a range from 1000 to 0) for the record or component. |
| #RAWSCORE# | Insert the raw retrieval score (calculated probability of relevance) for the record or component. |
| #RAWRELEVANCE# | Insert the raw retrieval score (calculated probability of relevance) for the record or component. |
| #FILENAME# | Insert the full pathname for the document or collection of documents. |
| #DBNAME# | Insert the (short) database name ( |
| #PARENT# | For component displays, extract and include the tag (specified as the sub-elements of the from tag after the #PARENT# ftag). In the display the elements will have their names converted to PARENT-x where 'x' is the tag name in the parent document. |
| SUBST_ELEMENT | Insert the element specified by the current elementsetname as "XML_ELEMENT_xxx" where xxx is the element name to be substituted. |
| TAG | Sub-tags/ATTR | Data Type | Meaning |
|---|---|---|---|
| <TITLESTRING ATTR> | ATTR | text | Title to use for this database. |
| ATTR: | LANGUAGE=code | language code | Use the USMARC three letter language codes |
| <DESCRIPTION ATTR> | ATTR | text | Description of the database and its contents. |
| ATTR: | LANGUAGE=code | language code | Use the USMARC three letter language codes |
| <DISCLAIMERS ATTR> | ATTR | text | Any disclaimers associated with the database. |
| ATTR: | LANGUAGE=code | language code | Use the USMARC three letter language codes |
| <NEWS ATTR> | ATTR | text | Any news about the database. |
| ATTR: | LANGUAGE=code | language code | Use the USMARC three letter language codes |
| <HOURS ATTR> | ATTR | text | Hours of access for the database. |
| ATTR: | LANGUAGE=code | language code | Use the USMARC three letter language codes |
| <BESTTIME ATTR> | ATTR | text | Best times to access the database. |
| ATTR: | LANGUAGE=code | language code | Use the USMARC three letter language codes |
| <LASTUPDATE> | none | text | Last time the database was updated. |
| <UPDATEINTERVAL> | SUBTAGS | text | Frequency of updates of the database. |
| SUBTAG | <VALUE> | number | The number of days, weeks, years, etc between updates. |
| SUBTAG | <UNIT> | time unit info | The time unit used (day, week, month, year); |
| <COVERAGE ATTR> | ATTR | text | Database coverage |
| ATTR: | LANGUAGE=code | language code | Use the USMARC three letter language codes |
| <PROPRIETARY> | none | boolean | Is the database proprietary? Acceptable contents are "YES" or "TRUE", "NO" or "FALSE". |
| <COPYRIGHTTEXT ATTR> | ATTR | text | Copyright text information |
| ATTR: | LANGUAGE=code | language code | Use the USMARC three letter language codes |
| <COPYRIGHTNOTICE ATTR> | ATTR | text | Copyright notice |
| ATTR: | LANGUAGE=code | language code | Use the USMARC three letter language codes |
| <PRODUCERCONTACTINFO> | none | contactinfo | Producer of the database -- contents are tagged using the CONTACT_... tags below. |
| <SUPPLIERCONTACTINFO> | none | contactinfo | Supplier of the database -- contents are tagged using the CONTACT_... tags below. |
| <SUBMISSIONCONTACTINFO> | none | text | Where to send submissions to the database -- contents are tagged using the CONTACT_... tags below. |
| <CONTACT_NAME> | none | text | Name of the the contact person. |
| <CONTACT_DESCRIPTION ATTR> | ATTR | text | Description of contact information. |
| ATTR: | LANGUAGE=code | language code | Use the USMARC three letter language codes |
| <CONTACT_ADDRESS ATTR> | ATTR | text | Contact Addrress. |
| ATTR: | LANGUAGE=code | language code | Use the USMARC three letter language codes |
| <CONTACT_EMAIL> | none | text | Contact Email. |
| <CONTACT_PHONE> | none | text | Contact Phone number. |
| Code | Affects | Meaning |
|---|---|---|
| BTREE | access type | Selects the DBOPEN BTREE format for the main index file. |
| HASH | access type | Selects the DBOPEN HASH format for the main index file. |
| BITMAPPED | access type | Selects a bitmapped index file. Bitmap indexes can ONLY be used for Boolean retrieval, and they are specifically designed to speed up access time on indexes that have only a few values (such as an attribute that appears in every record of the database and has only 3 or possible values), and a very large number of records (i.e., hundreds of thousands or millions of records). In a BITMAPPED index only a single bit is stored for (potentially) each record in the database, instead of the 64 bits per entry stored in conventional indexes. More information about bitmapped indexes is available here |
| VECTOR | access type | VECTOR files include a normal BTREE and an additional vector file, which must be created by running the index_vectors program after normal indexing. VECTOR files permit quick lookup by internal termids, and a listing of all termids with frequency information for each document with lookup by internal document id). The VECTOR files are primarily used for blind feedback retrieval, to locate additional related terms to be added to an initial query. |
| DBMS | access type | Indicates that the "index" name is an external DBMS table and column specification for the database indicated in the index tag. |
| Code | Affects | Meaning |
|---|---|---|
| KEYWORD | key type | Indicates that keywords are to be extracted from the elements specified for this index. (In external DBMS files this is used to match character or text fields using the "like" operator) |
| KEYWORD_EXTERNAL | key type | Indicates that keywords are to be extracted from the elements specified for this index some of which are indications of external non-SGML text files or of URLs of external HTML files. |
| KEYWORD_PROXIMITY | key type | Indicates that keywords are to be extracted from the elements specified for this index and that proximity (character position information) is to be maintained in the index. PROXIMITY and KEYWORD_PROX are synonyms for this specification. Note that indexes including proximity information will be much larger than simple keyword indexes. |
| KEYWORD_EXTERNAL_PROXIMITY | key type | Indicates that keywords are to be extracted from the elements specified for this index some of which are indications of external non-SGML text files or of URLs of external HTML files and that proximity (character position information) is to be maintained in the index. KEYWORD_PROXIMITY_EXTERNAL, KEYWORD_EXTERNAL_PROX, and KEYWORD_PROX_EXTERNAL are synonyms for this specification. Note that indexes including proximity information will be much larger than simple keyword indexes. |
| EXACTKEY | key type | Indicates that exact keys are to be extracted from the elements specified for this index. Nota Bene: if a pattern is used for the tags specified for exact key extraction all items that match that pattern at the same level of nesting will be extracted and concatenated to form the exact key -- see tag specifications below. (In external DBMS files this is used to match text or character fields using "="). |
| FLD008_KEY | key type | Indicates that sub-elements of a MARC 008 field are to be extracted from the field tag specified for this index. The particular sub-elements are listed in the "008 elements" table below. (Only applicable for SGML MARC records) |
| FLD008_DATE | key type | Indicates that sub-elements of a MARC 008 field are to be extracted from the field tag specified for this index. This is used in conjunction with one of the date formats for the NORMAL attribute(see below). The particular sub-elements are listed in the "008 elements" table below. (Only applicable for SGML MARC records) |
| FLD008_DATERANGE | key type | Indicates that sub-elements of a MARC 008 field are to be extracted from the field tag specified for this index. This is used in conjunction with one of the date range formats for the NORMAL attribute(see below). The particular sub-elements are listed in the "008 elements" table below. (Only applicable for SGML MARC records) |
| URL | key type | This is used to indicate that the contents to be extracted are URLs. |
| FILENAME | key type | This is used to indicate that the contents to be extracted are filenames (for example, full unix pathnames). |
| DATE | key type | This is to indicate that the extracted values should be parsed for date values based on the pattern provided in the NORMAL attribute (see below). |
| DATE_RANGE | key type | This is to indicate that the extracted values should be parsed for date range values based on the pattern provided in the NORMAL attribute (see below). |
| DATE_TIME | key type | This is to indicate that the extracted values should be parsed for date and time values based on the pattern provided in the NORMAL attribute (see below). |
| DATE_TIME_RANGE | key type | This is to indicate that the extracted values should be parsed for date and time range values based on the pattern provided in the NORMAL attribute (see below). |
| INTEGER_KEY | key type | This is used to extract a single integer value, which should be the first data within the tag/field being extracted (white space will be ignored, but non numerical characters will not be indexed). The index entry is currently a zero-padded string of 10 digits with an optional leading minus sign. It is also used with DBMS files to indicate that the external DBMS type for this field is an integer. |
| DECIMAL_KEY | key type | This is used to extract a single decimal value, which should be the first data within the tag/field being extracted (white space will be ignored, but non numerical characters will not be indexed). The number may include a decimal point and fractional part (for example "100.4356". The index entry is currently a zero-padded string of 16 digits with an optional leading minus sign and 6 digits after the decimal point. This specification is also used with DBMS files to indicate that the external DBMS type for this field is an decimal number. |
| LAT_LONG | key type | This is to indicate that the extracted values should be parsed for Latitude and Longitude values based on the pattern provided in the NORMAL attribute (see below). |
| LATITUDE_LONGITUDE | key type | Same as LAT_LONG above. |
| LATITUDE/LONGITUDE | key type | Same as LAT_LONG above. |
| GEO_POINT | key type | Same as LAT_LONG above. |
| BOUNDING_BOX | key type | This is to indicate that the extracted values should be parsed for a pair of Latitude and Longitude values based on the pattern provided in the NORMAL attribute (see below). |
| GEO_BOX | key type | Same as BOUNDING_BOX above. |
| GEOTEXT | key type | This normalization relies on an external cheshire database containing a gazetteer, candidate terms in the document are compared against this gazetteer database and only terms/phrases that match gazetteer entries are actually entered into the index. The EXTERN_APP tag (see above) must be used to specify the config file, database name and index to be used in matching. The NORMAL tag should indicate one of the text processing normalization methods below (BASIC) is simplest. The generated terms are, by default, keywords derived from the Gazetteer entry names, using EXACTKEY or EXACTKEY_NOMAP uses the exact phrase from the Gazetteer. DO_NOT_NORMALIZE retains the exact capitalization, spacing, etc. from from Gazetteer entry. |
| GEOTEXT_LAT_LONG | key type | This normalization relies on an external cheshire database containing a gazetteer, candidate terms in the document are compared against this gazetteer database and only terms/phrases that match gazetteer entries are actually included, but the (point) coordinates from the gazetteer are used as the generated keys. The EXTERN_APP tag (see above) must be used to specify the config file, database name and index to be used in matching. Lat_long NORMAL specifications should be used to indicate how queries should be parsed when matching to the generated index. |
| GEOTEXT_BOUNDING_BOX | key type | This normalization relies on an external cheshire database containing a gazetteer, candidate terms in the document are compared against this gazetteer database and only terms/phrases that match gazetteer entries are actually included. In this case bounding box coordinates from the gazetteer are used directly or generated from point data for the generated keys entered into the index. The EXTERN_APP tag (see above) must be used to specify the config file, database name and index to be used in matching. Bounding_box NORMAL specifications should be used to indicate how queries should be parsed when matching to the generated index. |
| Code | Affects | Meaning |
|---|---|---|
| STEM | normalization | Indicates that the Porter stemming algorithm should be used to normalize the keywords extracted. |
| STEM_FREQ | normalization | Indicates that the Porter stemming algorithm should be used to normalize the keywords extracted. This normalization type assumes that each keyword is paired with frequency information in the form "{word 20}", where 20 is the frequency. THIS IS PRIMARILY USED FOR COLLECTION-LEVEL DOCUMENTS IN DISTRIBUTED SEARCH. |
| SSTEM | normalization | Indicates that a simple plural-removal algorithm should be used to normalize the keywords extracted. |
| SSTEM_FREQ | normalization | Indicates that a simple plural-removal algorithm should be used to normalize the keywords extracted. This normalization type assumes that each keyword is paired with frequency information in the form "{word 20}", where 20 is the frequency. THIS IS PRIMARILY USED FOR COLLECTION-LEVEL DOCUMENTS IN DISTRIBUTED SEARCH. |
| FRENCH_STEM | normalization | Indicates that a Snowball-stemmer generated stemmer for the French language should be used to normalize the keywords extracted. (Note: FREQ form is also supplied as ..._STEMFREQ). |
| GERMAN_STEM | normalization | Indicates that a Snowball-stemmer generated stemmer for the German language should be used to normalize the keywords extracted. (Note: FREQ form is also supplied as ..._STEMFREQ). |
| ENGLISH_STEM | normalization | Indicates that a Snowball-stemmer generated stemmer for the English language (porter2 stemmer) should be used to normalize the keywords extracted. (Note: FREQ form is also supplied as ..._STEMFREQ). |
| PORTER_STEM | normalization | Indicates that a Snowball-stemmer generated stemmer for the English language (original porter stemmer) language should be used to normalize the keywords extracted. (Note: FREQ form is also supplied as ..._STEMFREQ). |
| DUTCH_STEM | normalization | Indicates that a Snowball-stemmer generated stemmer for the Dutch language should be used to normalize the keywords extracted. (Note: FREQ form is also supplied as ..._STEMFREQ). |
| SPANISH_STEM | normalization | Indicates that a Snowball-stemmer generated stemmer for the Spanish language should be used to normalize the keywords extracted. (Note: FREQ form is also supplied as ..._STEMFREQ). |
| ITALIAN_STEM | normalization | Indicates that a Snowball-stemmer generated stemmer for the Italian language should be used to normalize the keywords extracted. (Note: FREQ form is also supplied as ..._STEMFREQ). |
| SWEDISH_STEM | normalization | Indicates that a Snowball-stemmer generated stemmer for the Swedish language should be used to normalize the keywords extracted. (Note: FREQ form is also supplied as ..._STEMFREQ). |
| PORTUGUESE_STEM | normalization | Indicates that a Snowball-stemmer generated stemmer for the Portuguese language should be used to normalize the keywords extracted. (Note: FREQ form is also supplied as ..._STEMFREQ). |
| RUSSIAN_STEM | normalization | Indicates that a Snowball-stemmer generated stemmer for the Russian language (in UTF-8 encoding) should be used to normalize the keywords extracted. (Note: FREQ form is also supplied as ..._STEMFREQ). |
| RUSSIAN_UTF8_STEM | normalization | Indicates that a Snowball-stemmer generated stemmer for the Russian language (UTF-8 Encoding) should be used to normalize the keywords extracted. (Note: FREQ form is also supplied as ..._STEMFREQ). (alias of the previous code) |
| RUSSIAN_KOI8_STEM | normalization | Indicates that a Snowball-stemmer generated stemmer for the Russian language (in KOI-8 encoding) should be used to normalize the keywords extracted. (Note: FREQ form is also supplied as ..._STEMFREQ). |
| DANISH_STEM | normalization | Indicates that a Snowball-stemmer generated stemmer for the Danish language should be used to normalize the keywords extracted. (Note: FREQ form is also supplied as ..._STEMFREQ). |
| NORWEGIAN_STEM | normalization | Indicates that a Snowball-stemmer generated stemmer for the Norwegian language should be used to normalize the keywords extracted. (Note: FREQ form is also supplied as ..._STEMFREQ). |
| WORDNET | normalization | Indicates that WordNet "morphing" should be used to normalize the keywords extracted. |
| EXACTKEY | normalization | Indicates that spacing should be normalized, punctuation and stopwords the removed from the key extracted. (XKEY is an alias for EXACTKEY). |
| XKEY_FREQ | normalization | Indicates that spacing should be normalized, punctuation and stopwords the removed from the key extracted. This normalization type assumes that each exact key is paired with frequency information in the form "{exact key value 20}", where 20 is the frequency. THIS IS PRIMARILY USED FOR COLLECTION-LEVEL DOCUMENTS IN DISTRIBUTED SEARCH. |
| CLASSCLUS | normalization | Indicates that LC class number normalization for classification clustering should be used to normalize the key extracted. |
| BASIC_NOMAP | normalization | No term normalization will be done. _NOMAP indicates that the default character mappings done during normalization in indexing and retrieval should NOT be used |
| STEM_NOMAP | normalization | Indicates that the Porter stemming algorithm should be used to normalize the keywords extracted. _NOMAP indicates that the default character mappings done during normalization in indexing and retrieval should NOT be used |
| WORDNET_NOMAP | normalization | Indicates that WordNet "morphing" should be used to normalize the keywords extracted. _NOMAP indicates that the default character mappings done during normalization in indexing and retrieval should NOT be used | EXACTKEY_NOMAP | normalization | Indicates that spacing should be normalized, punctuation and stopwords the removed from the key extracted. (XKEY_NOMAP is an alias for EXACTKEY_NOMAP). |
| CLASSCLUS_NOMAP | normalization | Indicates that LC class number normalization for classification clustering should be used to normalize the key extracted. _NOMAP indicates that the default character mappings done during normalization in indexing and retrieval should NOT be used |
| DATE PATTERN VALUE | normalization | Indicates that the key extracted will be a date (the EXTRACT attribute must be one of the date types) and the date pattern will be used in matching the elements of the dates that appear in the database records. See the table Date Format Patterns and Their Interpretations for DATE_TIME format patterns. Date parsing is fairly flexible, and as long as the order of elements is correct, dates should be correctly extracted. (NOTE: date patterns for cluster key normalization is not currently supported) |
| LAT_LONG PATTERN VALUE | normalization | Indicates that the key extracted will be a Latitude and Longitude value (the EXTRACT attribute must be one of the LAT_LONG types) and the LAT_LONG pattern will be used in matching the elements of the coordinates that appear in the database records. LAT_LONG parsing is fairly flexible, but the order of elements is important. See the table LAT_LONG and BOUNDING_BOX Format Patterns and Their Interpretations for the specific codes that can be used. |
| BOUNDING_BOX PATTERN VALUE | normalization | Indicates that the key extracted will be a pair of Latitude and Longitude values (the EXTRACT attribute must be one of the BOUNDING_BOX types) and the LAT_LONG pattern will be used in matching the elements of the coordinates that appear in the database records. Parsing is fairly flexible, but the order of elements is important. See the table LAT_LONG and BOUNDING_BOX Format Patterns and Their Interpretations for the specific codes that can be used. |
| BASIC | normalization | Indicates that the key extracted should simply be converted to lower case and have punctuation, extraneous spaces and stopwords removed. |
| NONE | normalization | Same as BASIC, this is the deprecated name for this option. Indicates that the key extracted should simply be converted to lower case and have punctuation, extraneous spaces and stopwords removed. NONE can be used just as BASIC is use (including in the compounds below). |
| BASIC_FREQ | normalization | Indicates that the key extracted should simply be converted to lower case and have punctuation and stopwords removed. This normalization type assumes that each keyword is paired with frequency information in the form "{word 20}", where 20 is the frequency. THIS IS PRIMARILY USED FOR COLLECTION-LEVEL DOCUMENTS IN DISTRIBUTED SEARCH. |
| DO_NOT_NORMALIZE | normalization | This mean NO normalization, including the simple normalization done by BASIC is to be performed. The terms are put into the index exactly as they appear in the document with capitalization, etc. intact. NOTE that if SGML/XML tags appear in the extracted data, they WILL be included in the generated key. |
| REMOVE_TAGS_ONLY | normalization | This mean NO normalization, including the simple normalization done by BASIC is to be performed, EXCEPT FOR REMOVAL OF EMBEDDED SGML/XML TAGS in the data. The terms are put into the index exactly as they appear in the document with capitalization, etc. intact. | REMOVE_TAGS_ONLY_FREQ | normalization | This mean NO normalization, including the simple normalization done by BASIC is to be performed, EXCEPT FOR REMOVAL OF EMBEDDED SGML/XML TAGS in the data. The terms are put into the index exactly as they appear in the document with capitalization, etc. intact. This normalization type assumes that each key is paired with frequency information in the form "{exact key value 20}", where 20 is the frequency. THIS IS PRIMARILY USED FOR COLLECTION-LEVEL DOCUMENTS IN DISTRIBUTED SEARCH. |
| FORMAT | Type | Meaning |
|---|---|---|
| YYMMDD | date | Fixed format dates, e.g.: 980223. Note that the system will assume that dates in this form are for the Twentieth century ONLY -- 98 becomes 1998 |
| YYYYMMDD | date | Fixed format date, e.g.: 19980223 |
| MM/DD/YY | date | slash style dates, e.g.: 2/23/98 Note that the system will assume that dates in this form are for the Twentieth century ONLY -- 98 becomes 1998 |
| MM/DD/YYYY | date | slash style dates with full year, e.g.: 2/23/1998. |
| DD/MM/YY | date | slash style dates, e.g.: 2/23/98 Note that the system will assume that dates in this form are for the Twentieth century ONLY -- 98 becomes 1998 |
| DD/MM/YYYY | date | slash style dates with full year, e.g.: 2/23/1998. |
| MM.DD.YY | date | dot style dates, e.g.: 2.23.98 Note that the system will assume that dates in this form are for the Twentieth century ONLY -- 98 becomes 1998 |
| MM.DD.YYYY | date | dot style dates with full year, e.g.: 2/23/1998. |
| DD.MM.YY | date | dot style dates, e.g.: 22.02.98 Note that the system will assume that dates in this form are for the Twentieth century ONLY -- 98 becomes 1998 |
| DD.MM.YYYY | date | dot style dates with full year, e.g.: 2/23/1998. |
| DD MMM YYYY | date | Dates with month abbreviations, e.g.: 23 Feb 1998 |
| DD MMM YEAR | date | Dates with month abbreviations, e.g.: 23 Feb 1998. This form may be used if the data contain years with less (or more) than 4 digits, but are not in the Twentieth century. For example "13 Apr 856 AD", or "25 Dec 1 B.C." are legal dates for this format. |
| DD MMM YY | date | Dates with month abbreviations, e.g.: 23 Feb 98. Note that the system will assume that dates in this form are for the Twentieth century ONLY -- 98 becomes 1998 |
| DD MONTH YY | date | Dates with month abbreviations or full month names, e.g.: 23 February 98. Note that the system will assume that dates in this form are for the Twentieth century ONLY -- 98 becomes 1998 |
| DD MONTH YYYY | date | Dates with month abbreviations or full month names, e.g.: 23 February 98. |
| DD MONTH YEAR | date | Dates with month abbreviations or full month names, e.g.: 23 February 98. This form may be used if the data contain years with less (or more) than 4 digits, but are not in the Twentieth century. For example "13 Apr 856 AD", or "25 Dec 1 B.C." are legal dates for this format. |
| MONTH DD, YY | date | Dates with month abbreviations or full month names, e.g.: February 23, 98. Note that the system will assume that dates in this form are for the Twentieth century ONLY -- 98 becomes 1998 |
| MONTH DD, YYYY | date | Dates with month abbreviations or full month names, e.g.: February 23, 1998. |
| YYMMDD HH:MM | date_time | Fixed format date and time, e.g.: 980223 12:04 Note that the system will assume that dates in this form are for the Twentieth century ONLY -- 98 becomes 1998 |
| YYYYMMDD HH:MM | date_time | Fixed format date and time, e.g.: 19980223 12:04 |
| MM/DD/YY HH:MM | date_time | slash style dates with time of day, e.g.: 2/23/98 12:02 Note that the system will assume that dates in this form are for the Twentieth century ONLY -- 98 becomes 1998 |
| MM/DD/YYYY HH:MM | date_time | slash style dates with time of day, e.g.: 2/23/1998 12:02 |
| DD/MM/YY HH:MM | date_time | slash style dates with time of day, e.g.: 23/2/98 12:02 Note that the system will assume that dates in this form are for the Twentieth century ONLY -- 98 becomes 1998 |
| DD/MM/YYYY HH:MM | date_time | slash style dates with time of day, e.g.: 23/2/1998 12:02 |
| MM.DD.YY HH:MM | date_time | dot style dates with time of day, e.g.: 2.23.98 12:02 Note that the system will assume that dates in this form are for the Twentieth century ONLY -- 98 becomes 1998 |
| MM.DD.YYYY HH:MM | date_time | dot style dates with time of day, e.g.: 2.23.1998 12:02 |
| DD.MM.YY HH:MM | date_time | dot style dates with time of day, e.g.: 23.2.98 12:02 Note that the system will assume that dates in this form are for the Twentieth century ONLY -- 98 becomes 1998 |
| DD.MM.YYYY HH:MM | date_time | dot style dates with time of day, e.g.: 23.2.1998 12:02 |
| YYYY | date | fixed format, year only, e.g.: 1998, 1754, 1292, 1056 |
| YEAR | date | Variable length years, e.g.: "1998", "24 B.C.E.", "57 A.D.", "500000 B.C.", "1852 C.E." |
| YYDDD | date | Julian Dates (not currently supported) |
| DECADE | date_range | Decades within a century, e.g. "1940's" is interpreted as the range from 1940-1949, "20's" is interpreted as 1920-1929, 1870's is interpreted as 1870-1879. |
| CENTURY | date_range | Centuries, e.g.: 20th is interpreted as 1900-1999, "4th century b.c." is interpreted as 400B.C.-301B.C. |
| MILLENNIUM | date_range | Millennia, e.g.: "4th millenium B.C." is interpreted as 4000B.C.- 3001B.C. |
| MIXED YEAR | date | Extracts year from mixed data which may include full dates and years. Assumes that only 3 and 4 digit numbers in the string is the year, so it may error if there are decade, century or millenium specifications. Checks for era also. Any text (except for era specifications following the year -- e.g.: 924 B.C.--) or non-numeric characters are ignored. |
| MIXED_YEAR | date | Same as above |
| UNIX_TIME | date | Dates as returned by the UNIX ctime function, for example "Wed Apr 12 10:18:21 2000 EDT" the trailing zone information is ignored. |
| UNIX_CTIME | date | Same as above |
| DAY MONTH DD HH:MM:SS YEAR | date | Same as UNIX_TIME above |
| DAY MMM DD HH:MM:SS YYYY | date | Same as UNIX_TIME above |
| YYYY-YYYY | date_range | Range of years in fixed (4 digit) form, e.g.: 1920-1945 |
| YYYY to YYYY | date_range | Range of years in fixed (4 digit) form, e.g.: "1920 to 1945" |
| YYYY through YYYY | date_range | Range of years in fixed (4 digit) form, e.g.: "1920 through 1945" |
| YEAR-YEAR | date_range | Range of years without fixed format, e.g.: "22 B.C - 12 A.D.", 1950-1990, "50000 BCE - 100 CE" |
| YEAR to YEAR | date_range | Range of years without fixed format, e.g.: "22 B.C to 12 A.D.", 1950 to 1990, "50000 BCE to 100 CE" |
| YEAR through YEAR | date_range | Range of years without fixed format, e.g.: "22 B.C - 12 A.D.", 1950-1990, "50000 BCE to 100 CE" |
| DECADE-DECADE | date_range | Range of decades, e.g.: "20's-50's" is interpreted as 1920-1959. |
| DECADE to DECADE | date_range | Range of decades, e.g.: "20's to 50's" is interpreted as 1920-1950. |
| DECADE through DECADE | date_range | Range of decades, e.g.: "20's through 50's" is interpreted as 1920-1959. |
| CENTURY-CENTURY | date_range | Range of Centuries, e.g.: "3rd cent bc - 8th century ad" is interpreted as 300B.C.-700A.D. |
| CENTURY to CENTURY | date_range | Range of Centuries, e.g.: "3rd cent bc to 8th century ad" is interpreted as 300B.C.-700A.D. |
| CENTURY through CENTURY | date_range | Range of Centuries, e.g.: "3rd cent bc through 8th century ad" is interpreted as 300B.C.-799A.D. |
| MILLENNIUM-MILLENNIUM | date_range | Range of Millennia, e.g.: "8th millennium bc through 1st millenium AD" is interpreted as 8000B.C.-999A.D. |
| MILLENNIUM to MILLENNIUM | date_range | Range of Millennia, e.g.: "8th millennium bc through 1st millenium AD" is interpreted as 8000B.C.-999A.D. |
| MILLENNIUM through MILLENNIUM | date_range | Range of Millennia, e.g.: "8th millennium bc through 1st millenium AD" is interpreted as 8000B.C.-999A.D. |
| MIXED YEAR RANGE | date | Extracts one or a pair of years from mixed data which may include full dates and years. Assumes that 3 and 4 digit numbers in the string are the year specifications. The first two three and four digit numbers are used to determine the date range, so it may error if there are decade, century or millenium specifications. Checks for era also. If the data contains only a single (3 or 4 digit) year, the range is created for the that year only. Any text (except for era specifications following the year -- e.g.: 924 B.C.--) or non-numeric characters are ignored. |
| MIXED_YEAR_RANGE | date | Same as above |
| FORMAT | Type | Meaning |
|---|---|---|
| DDoMM'SS''NS DDDoMM'SS''EW | LAT_LONG | Latitude and Longitude using "DD" (degrees), "MM" (minutes) and "SS", separated by "o" (lower case O) for degrees, a single quote for minutes and two single quotes for seconds, followed by EITHER "N" or "S" for Latitudes or "E" or "W" for Longitudes. For example, Latitude and Longitude for San Francisco, California can be expressed as "37o45'53''N 122o24'36''W" and parsed using this format. |
| DD-MM-SS NS DDD-MM-SS EW | LAT_LONG | Latitude and Longitude using "DD" (degrees), "MM" (minutes) and "SS", separated by hypens, followed by one or more spaces and EITHER "N" or "S" for Latitudes or "E" or "W" for Longitudes. For example, Latitude and Longitude for San Francisco, California can be expressed as "37-45-53 N 122-24-36 W" and parsed using this format. |
| DD-MM-SS-NS DDD-MM-SS-EW | LAT_LONG | Latitude and Longitude using "DD" (degrees), "MM" (minutes) and "SS", and EITHER "N" or "S" for Latitudes or "E" or "W" for Longitudes, separated by hypens. For example, Latitude and Longitude for San Francisco, California can be expressed as "37-45-53-N 122-24-36-W" and parsed using this format. |
| DECIMAL_LAT_LONG | LAT_LONG | Latitude and Longitude using degrees and decimal fractions of degrees with positive numbers indicating NORTH latitudes or EAST longitudes, and negative numbers indicating SOUTH latitudes and WEST longitudes. For example, Latitude and Longitude for San Francisco, California can be expressed as "37.765 -122.41" and parsed using this format. |
| DECIMAL_LONG_LAT | LAT_LONG | Latitude and Longitude using degrees and decimal fractions of degrees with positive numbers indicating NORTH latitudes or EAST longitudes, and negative numbers indicating SOUTH latitudes and WEST longitudes. For example, Longitude and Latitude for San Francisco, California can be expressed as "-122.41 37.765" and parsed using this format. |
| DECIMAL_BOUNDING_BOX | BOUNDING_BOX | A pair of Latitudes and Longitudes using degrees and decimal fractions of degrees with positive numbers indicating NORTH latitudes or EAST longitudes, and negative numbers indicating SOUTH latitudes and WEST longitudes. The first latitude and longitude represent the North-West corner of the bounding box and the second latitude and longitude represent the South-West corner. For example, a bounding box around peninsula of California can be expressed as "37.815834 -122.498886 37.601112 -122.318611" and parsed using this format. |
| FGDC_BOUNDING_BOX | BOUNDING_BOX | A set of Latitude and Longitude coordinates using degrees and decimal fractions of degrees
with positive numbers indicating NORTH latitudes or EAST longitudes, and
negative numbers indicating SOUTH latitudes and WEST longitudes. These
coordinates follow the order for bounding boxes in the FGDC guidelines.
That is, the first coordinate is the West bounding coordinate (BC),
followed by the East BC, the North BC, and the South BC.
For example, a FGDC bounding box around peninsula of
California can be expressed as
"-122.498886 -122.318611 37.815834 37.601112"
and parsed using this format. Note that whitespace and XML tags are
ignored when extracting the data, so a recommended FGDC XML bounding
box definition like:
<bounding> <westbc>-122.71456201</westbc> <eastbc>-122.19128542</eastbc> <northbc>38.00205344</northbc> <southbc>37.68397727</southbc> </bounding> can be parsed correctly without pre-processing. |
| DDoMM'SS''NS DDDoMM'SS''EW DDoMM'SS''NS DDDoMM'SS''EW | BOUNDING_BOX | A pair of Latitude and Longitude coordinates using "DD" (degrees), "MM" (minutes) and "SS", separated by "o" (lower case O) for degrees, a single quote for minutes and two single quotes for seconds, followed by EITHER "N" or "S" for Latitudes or "E" or "W" for Longitudes. For example, a bounding box for the peninsula of San Francisco, California can be expressed as "37o48'57''N 122o29'56''W 37o36'4''N 122o19'7''W" and parsed using this format |
| DDoMM'SS''NS DDDoMM'SS''EW DDoMM'SS''NS DDDoMM'SS''EW | BOUNDING_BOX | A pair of Latitude and Longitude coordinates using "DD" (degrees), "MM" (minutes) and "SS", separated by "o" (lower case O) for degrees, a single quote for minutes and two single quotes for seconds, followed by EITHER "N" or "S" for Latitudes or "E" or "W" for Longitudes. For example, a bounding box for the peninsula of San Francisco, California can be expressed as "37o48'57''N 122o29'56''W 37o36'4''N 122o19'7''W" and parsed using this format |
| DD-MM-SS NS DDD-MM-SS EW DD-MM-SS NS DDD-MM-SS EW | BOUNDING_BOX | A pair of Latitude and Longitude coordinates using "DD" (degrees), "MM" (minutes) and "SS", separated by hyphens, followed whitespace and by EITHER "N" or "S" for Latitudes or "E" or "W" for Longitudes. For example, a bounding box for the peninsula of San Francisco, California can be expressed as "37-48-57 N 122-29-56 W 37-36-4 N 122-19-7 W" and parsed using this format |
| DD-MM-SS-NS DDD-MM-SS-EW DD-MM-SS-NS DDD-MM-SS-EW | BOUNDING_BOX | A pair of Latitude and Longitude coordinates using "DD" (degrees), "MM" (minutes) and "SS", and EITHER "N" or "S" for Latitudes or "E" or "W" for Longitudes, all separated by hyphens. For example, a bounding box for the peninsula of San Francisco, California can be expressed as "37-48-57-N 122-29-56-W 37-36-4-N 122-19-7-W" and parsed using this format |
| Note on Format Parsing: | In the above described formats (other than format keywords "DECIMAL_LAT_LONG", "DECIMAL_LONG_LAT" and "DECIMAL_BOUNDING_BOX", which require things in an appropriate decimal form), the elements used in the DATA itself as separators need only be one of any of the separators in all of the above formats. This means that, for example, the format specification "DDoMM'SS''NS DDDoMM'SS''EW" could be used to correctly parse data that looked like "37-48-57-N 122-29-56-W" or "37o48-57 N 122'29'56'W", or even " 37 48 57 n 122 29 56 w", etc. However, the one of the formats, as shown above, should appear in the EXTRACT attribute for the index for correct configuration file processing. | |
| Name | 008 position | Meaning |
|---|---|---|
| 008_entry_date | pos 00-05 | Date record created as yymmdd. |
| 008_date_type | pos 06 | Type of date code; s = single, c = 2 dates actual and copyright, etc. |
| 008_date1 | pos 07-10 | If date type is s or c this is the publication date. |
| 008_date2 | pos 11-14 | If date type is c this is the copyright date. |
| 008_daterange | pos 11-14 | If date type is i, k, m, q, c, or d then date1 is used as the starting date of a date range, and date2 is used as the ending date of the date range. If any other letters are used, the date1 value is assumed to both the start and end date of the range. |
| 008_country_code | pos 15-17 | Country of publication code. |
| 008_illus_code | pos 18-21 | Types of illustration codes: a = illus; b = maps; c = portraits; etc. |
| 008_intellectual_level | pos 22 | Intellectual Level j = juvenile, blank = adult. |
| 008_form_of_reproduction | pos 23 | Form of reproduction: blank = paper or original, a = microfilm, etc. |
| 008_nature_of_contents | pos 24-27 | Contents: b = is or has bibliography, i = is index, etc. |
| 008_government_pub_code | pos 28 | Government Document code: f = federal/national, s = state/province; etc. |
| 008_conference_indicator | pos 29 | Conference code: 0 or blank - not a conference; 1 = conference pub. |
| 008_festschrift_indicator | pos 30 | 0 = Not a Festschrift; 1 = Festschrift. |
| 008_index_indicator | pos 31 | 0 = No index; 1 = has index. |
| 008_main_entry_in_body | pos 32 | 0 = Main entry not in body of record; 1 = is in body of record. |
| 008_fiction_indicator | pos 33 | 0 = Not Fiction; 1 = is Fiction. |
| 008_biography_indicator | pos 34 | a = autobiography; b = individual biography; etc. |
| 008_language_code | pos 35-37 | Three character language code (e.g. eng, fre, rus). |
| 008_modified_record_code | pos 38 | blank = Not modified record; m = modified record. |
| 008_cataloging_source | pos 39 | Cataloging source code: blank = LC; a = NAL; d = Non-LC; etc. |
| TAG | Data Type | Meaning |
|---|---|---|
| <use> | USE attribute | USE attribute numbers from the Z39.50 standard (defaults to BIB-1, table A-3-1 reproduced below) in the Appendix 3 of the standard. If USE value of "0" is given, it indicates that the index is the default index when no attributes are specified in the query. |
| <relation> | RELATION attribute | Relation attribute numbers from the Z39.50 standard (defaults to BIB-1, table A-3-2 reproduced below) in the Appendix 3 of the standard. |
| <position> | POSITION attribute | Position attribute numbers from the Z39.50 standard (defaults to BIB-1, table A-3-3 reproduced below) in the Appendix 3 of the standard. |
| <struct> | STRUCTURE attribute | Structure attribute numbers from the Z39.50 standard (defaults to BIB-1, table A-3-4 reproduced below) in the Appendix 3 of the standard. |
| <trunc> | TRUNCATION attribute | Truncation attribute numbers from the Z39.50 standard (defaults to BIB-1, table A-3-5 reproduced below) in the Appendix 3 of the standard. |
| <complet> | COMPLETENESS attribute | Completeness attribute numbers from the Z39.50 standard (defaults to BIB-1, table A-3-6 reproduced below) in the Appendix 3 of the standard. |
| <access_point> | Access Point attribute type | Access Point attributes from the Z39.50 attribute architecture. Type number 1. (<access ... > can be used as an alias for this tag). |
| <semantic_qualifier> | Semantic Qualifier attribute type | Semantic Qualifier attributes from the Z39.50 attribute architecture. Type number 2. (<semantic ... > can be used as an alias for this tag). |
| <language> | Language attribute type | Language attributes from the Z39.50 attribute architecture. Type number 3. |
| <content_authority> | Content Authority attribute type | Content Authority attributes from the Z39.50 attribute architecture. Type number 4. (<authority ... > can be used as an alias for this tag). |
| <expansion> | Expansion attribute type | Expansion attributes from the Z39.50 attribute architecture. Type number 5. |
| <normalized_weight> | Normalized Weight attribute type | Normalized Weight attributes from the Z39.50 attribute architecture. Type number 6. (<weight ... > can be used as an alias for this tag). |
| <hit_count> | Hit Count attribute type | Hit Count attributes from the Z39.50 attribute architecture. Type number 7. (<hits ... > can be used as an alias for this tag). |
| <comparison> | Comparison attribute type | Comparison attributes from the Z39.50 attribute architecture. Type number 8. |
| <format> | Format/Structure attribute type | Format/Stucture attributes from the Z39.50 attribute architecture. Type number 9. |
| <occurrence> | Occurrence attribute type | Occurrence attributes from the Z39.50 attribute architecture. Type number 10. |
| <indirection> | Indirection attribute type | Indirection attributes from the Z39.50 attribute architecture. Type number 11. |
| <functional_qualifier> | Functional Qualifier attribute type | Functional Qualifier attributes from the Z39.50 attribute architecture. Type number 12. (<functional ... > or <function ... > can be used as an alias for this tag). |
| <description> | DESCRIPTION of USE attribute | The description can be used to identify the attribute by a local name in the automatically generated EXPLAIN data, otherwise the USE value will be used to lookup the typical name in an internal table. NOTE: The description applies only to the USE attribute and not to the entire combination. If multiple descriptions are given for the same USE attribute, only the FIRST will be used in generating explain records. |
| Use | Value | Use | Value | Use | Value |
|---|---|---|---|---|---|
| Personal name | 1 | Title collective | 34 | Author | 1003 |
| Corporate name | 2 | Title parallel | 35 | Author-name personal | 1004 |
| Conference name | 3 | Title cover | 36 | Author-name corporation | 1005 |
| Title | 4 | Title added title page | 37 | Author-name conference | 1006 |
| Title series | 5 | Title caption | 38 | Identifier -- standard | 1007 |
| Title uniform | 6 | Title running | 39 | Subject -- LC children's | 1008 |
| ISBN | 7 | Title spine | 40 | Subject name -- personal | 1009 |
| ISSN | 8 | Title other variant | 41 | Body of text | 1010 |
| LC card number | 9 | Title former | 42 | Date/time added to database | 1011 |
| BNB card no. | 10 | Title abbreviated | 43 | Date/time last modified | 1012 |
| BGF number | 11 | Title expanded | 44 | Authority/format identifier | 1013 |
| Local number | 12 | Subject precis | 45 | Concept-text | 1014 |
| Dewey classification | 13 | Subject rswk | 46 | Concept-reference | 1015 |
| UDC classification | 14 | Subject subdivision | 47 | Any | 1016 |
| Bliss classification | 15 | Number natl biblio | 48 | Default | 1017 |
| LC call number | 16 | Number legal deposit | 49 | Publisher | 1018 |
| NLM call number | 17 | Number govt publication | 50 | Record-source | 1019 |
| NAL call number | 18 | Number publisher for music | 51 | Editor | 1020 |
| MOS call number | 19 | Number db | 52 | Bib-level | 1021 |
| Local classification | 20 | Number local call | 53 | Geographic-class | 1022 |
| Subject heading | 21 | Code -- language | 54 | Indexed-by | 1023 |
| Subject Rameau | 22 | Code -- geographic area | 55 | Map-scale | 1024 |
| BDI index subject | 23 | Code -- institution | 56 | Music-key | 1025 |
| INSPEC subject | 24 | Name and title | 57 | Related-periodical | 1026 |
| MESH subject | 25 | Name geographic | 58 | Report-number | 1027 |
| PA subject | 26 | Place publication | 59 | Stock-number | 1028 |
| LC subject heading | 27 | CODEN | 60 | Thematic-number | 1030 |
| RVM subject heading | 28 | Microform generation | 61 | Material-type | 1031 |
| Local subject index | 29 | Abstract | 62 | Doc-id | 1032 |
| Date | 30 | Note | 63 | Host-item | 1033 |
| Date of publication | 31 | Author-title | 1000 | Content-type | 1034 |
| Date of acquisition | 32 | Record type | 1001 | Anywhere | 1035 |
| Title key | 33 | Name | 1002 |
The rules governing the ussage of this attribute are explicitly articulated as follows (if this is inconsistent with Z39.58, these rules take precedence).
The character '?' (question mark) is used to mask a variable number of characters. It may be followed by a positive integer, i.e. one or more consecutive decimal digits (where the first is positive) in which the positive integer represented by the string of digits (beginning with the digit immediately following the '?', up to and not including the first non-digit character), indicates a range of characters to mask, from zero up to and including the specified integer.
When '?' is not immediately followed by a positive decimal digit, it indicates an arbitrary number of characters to mask (from zero to a system defined limit).
The character '#' (pound or number sign) is used to mask a single character. Multiple consecutive occurrences of '#' may be used to indicate a precise number of characters to mask.
| Name | Value | Semantics |
|---|---|---|
| Doc-id(semantic definition change) | 1032 | The following semantic definition was adopted at the August 1997 ZIG meeting. An identifier or Doc-ID, assigned by a server, that uniquely identifies a document on that server. May or may not be persistent. May be, for example, a URL. |
| SICI | 1037 | The Serial Item and Contribution Identifier; based on NISO Z39.56, which provides an extensible mechanism for the creation of a code which uniquely identifies either an issue of a serial title or a contribution (e.g., article) within a serial regardles of distribution meduim (paper, electronic, microform, etc.). |
| Following (thru 1083 submitted by German Library) | ||
| Abstract-language | 1038 | The language code or the language name for the language of the abstract. Entries are made in accordance with ISO 639-1988 'Code for the representation of names of language'. The 2-digit letter code is used as language code, the English term is used as language name. |
| Application-kind | 1039 | Information (code or full text) about the patent application. |
| Classification | 1040 | The code or the section heading of a classification system. The attribute is a general (broad) attribute for codes which are not existing as a serch key. |
| Classification-basic | 1041 | The code or the section heading of the Dutch classification system. |
| Classification-local-record | 1042 | The code or the section heading of a classification system assigned to a holding record. |
| Enzyme | 1043 | The enzyme code or the enzyme nomenclature. Codes are assigned by the 'Enzyme Commission' in the 'International Union of Biochemistry'. |
| Possessing-institution | 1044 | A code (library symbol or other code) of the institution which possesses the document. The codes are agreed upon between the partners. |
| Record-linking | 1045 | An alphabetical code which determines the type of linking. |
| Record-status | 1046 | Information about the status of the record, e.g. new, corrected, deleted, revised. |
| Treatment | 1047 | A statement (code or full text) describing subject aspects of the content. |
| Control-number-GKD | 1048 | The identification number of a corporate body name from the German authority file for corporate body names "Gemeinsame Koerper-schaftsdatei" (GKD). |
| Control-number-linking | 1049 | The unique identification number of the linked record. Attribute 'code-record-type' contains a code for the type of record which contains the 'control-number-linking' (b - bibliographical record, e - record for copy-specific data, n - authority record, l - holding record, p - record for cross references, v - full text). Attribute 'code-record-linking' contains a code which determines the type of linking. |
| Control-number-PND | 1050 | The identification number of a personal name from the German authority file for personal names "Personennamendatei" (PND). |
| Control-number-SWD | 1051 | The identification number of a subject heading from the German authority file for subject headings "Schlagwortnormdatei" (SWD). |
| Control-number-ZDB | 1052 | The identification number of a document in the German database for serials "Zeitschriftendatenbank" (ZDB). |
| Country-publication (country of Publication) | 1053 | The country code or the country name of the country where the document has been published. Entries are made according to ISO 3166 'Codes for the Representation of Names of Countries'. As country code a 2-digit letter code is used, as country name the English country name. |
| Date-conference (meeting date) | 1054 | The date of the conference or of another meeting. |
| Date-record-status | 1055 | The date on which the record status was assigned. |
| Dissertation-information | 1056 | Information about a dissertation thesis, or another publication connected with an academic degree. |
| Meeting-organizer | 1057 | The name of the organizer or the sponsor of a conference. |
| Note-availability | 1058 | Information about the availability of a document (delivery information). |
| Number-CAS-registry (CAS registry number) | 1059 | The 'Chemical Abstract Registry' number of the substance described in a document. |
| Number-document (document number) | 1060 | The publication number of the document (e.g. the number of the abstract in a secondary publication, or the number of the manuscript in a primary journal) provided that this number is not used as 'internal key' in the database. A publication number used as 'internal key' is entered in Attribute 12. |
| Number-local-accounting | 1061 | The account number of the document assigned by the accounting system. |
| Number-local-acquisition | 1062 | Document acquisition number assigned by the system. |
| Number-local-call-copy-specific | 1063 | The document's shelf number. |
| Number-of-reference (reference count) | 1064 | The number of literature references cited in a document. |
| Number-norm | 1065 | The number of a norm or standard. |
| Number-volume | 1066 | The number of single volumes of a multivolume publication, (year's) issues of serials, parts of multivolume publications, serials journals etc. |
| Place-conference (meeting location) | 1067 | The place where the conference or meeting was held. |
| Reference (references and footnotes) | 1068 | A literature reference from a document. The bibliographic data of a reference consists e.g. of a title, author, journal, publication year, volume and page information of the cited document. |
| Referenced-journal (reference work) | 1069 | The title of a journal cited in a document. |
| Section-code | 1070 | The section code of a subject classification. |
| Section-heading | 1071 | The section heading of a subject classification |
| Subject-GOO | 1072 | A subject heading from the 'Gemeenschappelijke Onderwerps Onts luiting' (GOO). |
| Subject-name-conference | 1073 | A conference name used as subject heading. |
| Subject-name-corporate | 1074 | A corporate body name used as a subject heading. |
| Subject-name-form | 1075 | A formal topical subject heading, e.g. collection of articles, handbook, source. |
| Subject-name-geographical | 1076 | A geographical/ethnographical subject heading. |
| Subject-name-chronological | 1077 | A chronological subject heading. |
| Subject-name-title | 1078 | A title proper used as subject heading. |
| Subject-name-topical | 1079 | A topical subject heading. |
| Subject-uncontrolled | 1080 | An uncontrolled subject term. General (broad) attribute for subject terms not specified any further, existing as a search key. |
| Terminology-chemical (chemical name) | 1081 | The description of a chemical substance. This is either the name of the substance or a name from another classification system, e.g. enzyme code. |
| Title-translated | 1082 | The translation of a title. |
| Year-of-beginning | 1083 | The publication year of the first issue/volume of serial publications (journals, newspapers, etc.). |
| Following (thru 1096) submitted by Danish National Library Authority, approved at January 1998 ZIG meeting | ||
| Year-of-ending | 1084 | The publication year of the last issue/volume of serial publications (journals, newspapers, etc. |
| Subject-AGROVOC | 1085 | A subject heading from the multilingual agricultural thesaurus from FAO. |
| Subject-COMPASS | 1086 | A subject heading from Computer Aided Subject System from British Library. |
| Subject-EPT | 1087 | A subject heading from European Pedagogical Thesaurus. |
| Subject-NAL | 1088 | A subject heading from National Agricultural Library. |
| Classification-BCM | 1089 | A classification number from British Catalogue of Music. |
| Classification-DB | 1090 | A classification number from Deutsche Bibliothek. |
| Identifier-ISRC | 1091 | ISRC. International standard recording code (ISO 3901). |
| Identifier-ISMN | 1092 | International standard music number (ISO 10957). ISMN |
| Identifier-ISRN | 1093 | ISRN. International standard technical report number (ISO 10444). |
| Identifier-DOI | 1094 | Digital Object Identifier. |
| Code-language-original | 1095 | A code that indicates the original language of the item. |
| Title-later | 1096 | A later version of title. |
| Following (1097 thru 1111) are Dublin Core Use attributes, approved at June 1998 ZIG meeting. For detailed semantics see Description of Dublin Core Elements | ||
| Name | Value | |
|---|---|---|
| DC-Title | 1097 |   |
| DC-Creator | 1098 |   |
| DC-Subject | 1099 |   |
| DC-Description | 1100 |   |
| DC-Publisher | 1101 |   |
| DC-Date | 1102 |   |
| DC-ResourceType | 1103 |   |
| DC-ResourceIdentifier | 1104 |   |
| DC-Language | 1105 |   |
| DC-OtherContributor | 1106 |   |
| DC-Format | 1107 |   |
| DC-Source | 1108 |   |
| DC-Relation | 1109 |   |
| DC-Coverage | 1110 |   |
| DC-RightsManagement | 1111 |   |
| The following (1112 thru 1184) are GILS Use attributes, approved at the June 1998 ZIG meeting. The GILS Use Attributes correspond semantically to GILS Core Elements, described in the GILS Z39.50 Specification Annex E. | ||
| Name | Value | |
|---|---|---|
| Controlled Subject Index | 1112 |   |
| Subject Thesaurus | 1113 |   |
| Index Terms -- Controlled | 1114 |   |
| Controlled Term | 1115 |   |
| Spatial Domain | 1116 |   |
| Bounding Coordinates | 1117 |   |
| West Bounding Coordinate | 1118 |   |
| East Bounding Coordinate | 1119 |   |
| North Bounding Coordinate | 1120 |   |
| South Bounding Coordinate | 1121 |   |
| Place | 1122 |   |
| Place Keyword Thesaurus | 1123 |   |
| Place Keyword | 1124 |   |
| Time Period | 1125 |   |
| Time Period Textual | 1126 |   |
| Time Period Structured | 1127 |   |
| Beginning Date | 1128 |   |
| Ending Date | 1129 |   |
| Availability | 1130 |   |
| Distributor | 1131 |   |
| Distributor Name | 1132 |   |
| Distributor Organization | 1133 |   |
| Distributor Street Address | 1134 |   |
| Distributor City | 1135 |   |
| Distributor State or Province | 1136 |   |
| Distributor Zip or Postal Code | 1137 |   |
| Distributor Country | 1138 |   |
| Distributor Network Address | 1139 |   |
| Distributor Hours of Service | 1140 |   |
| Distributor Telephone | 1141 |   |
| Distributor Fax | 1142 |   |
| Resource Description | 1143 |   |
| Order Process | 1144 |   |
| Order Information | 1145 |   |
| Cost | 1146 |   |
| Cost Information | 1147 |   |
| Technical Prerequisites | 1148 |   |
| Available Time Period | 1149 |   |
| Available Time Textual | 1150 |   |
| Available Time Structured | 1151 |   |
| Available Linkage | 1152 |   |
| Linkage Type | 1153 |   |
| Linkage | 1154 |   |
| Sources of Data | 1155 |   |
| Methodology | 1156 |   |
| Access Constraints | 1157 |   |
| General Access Constraints | 1158 |   |
| Originator Dissemination Control | 1159 |   |
| Security Classification Control | 1160 |   |
| Use Constraints | 1161 |   |
| Point of Contact | 1162 |   |
| Contact Name | 1163 |   |
| Contact Organization | 1164 |   |
| Contact Street Address | 1165 |   |
| Contact City | 1166 |   |
| Contact State or Province | 1167 |   |
| Contact Zip or Postal Code | 1168 |   |
| Contact Country | 1169 |   |
| Contact Network Address | 1170 |   |
| Contact Hours of Service | 1171 |   |
| Contact Telephone | 1172 |   |
| Contact Fax | 1173 |   |
| Supplemental Information | 1174 |   |
| Purpose | 1175 |   |
| Agency Program | 1176 |   |
| Cross Reference | 1177 |   |
| Cross Reference Title | 1178 |   |
| Cross Reference Relationship | 1179 |   |
| Cross Reference Linkage | 1180 |   |
| Schedule Number | 1181 |   |
| Original Control Identifier | 1182 |   |
| Language of Record | 1183 |   |
| Record Review Date | 1184 |   |
| Use | Value | Use | Value |
|---|---|---|---|
| DistributorName | 2001 | Contact Street Address | 2025 |
| Index Terms -- Controlled | 2002 | Contact City | 2026 |
| Purpose | 2003 | Contact State | 2027 |
| Access Constraints | 2004 | Contact Zip Code | 2028 |
| Use Constraints | 2005 | Contact Country | 2029 |
| Distributor Organization | 2006 | Contact Network Address | 2030 |
| Distributor Street Address | 2007 | Contact Hours of Service | 2031 |
| Distributor City | 2008 | Contact Telephone | 2032 |
| Distributor State | 2009 | Contact Fax | 2033 |
| Distributor Zip Code | 2010 | Agency Program | 2034 |
| Distributor Country | 2011 | Sources of Data | 2035 |
| Distributor Network Address | 2012 | Thesaurus | 2036 |
| Distributor Hours of Service | 2013 | Methodology | 2037 |
| Distributor Telephone | 2014 | Bounding Rectangle -- Western-most | 2038 |
| Distributor_Fax | 2015 | Bounding Rectangle -- Eastern-most | 2039 |
| Available Resource Description | 2016 | Bounding Rectangle -- Northern-most | 2040 |
| Available Order Process | 2017 | Bounding Rectangle -- Southern-most | 2041 |
| Available Technical Prerequisites | 2018 | Geographic Keyword Name | 2042 |
| Available Time Period -- Structured | 2019 | Geographic Keyword Type | 2043 |
| Available Time Period -- Textual | 2020 | Time Period - Structured | 2044 |
| Available Linkage | 2021 | Time Period - Textual | 2045 |
| Available Linkage Type | 2022 | Cross Reference Title | 2046 |
| Contact Name | 2023 | Cross Reference Linkage | 2047 |
| Contact Organization | 2024 | Cross Reference Type | 2048 |
| Original Control Identifier | 2049 | ||
| Supplemental Information | 2050 | ||
| Relation | Value | Relation | Value | Relation | Value |
| less than | 1 | greater or equal | 4 | phonetic | 100 |
| less or equal | 2 | greater than | 5 | stem | 101 |
| equal | 3 | not equal | 6 | relevance | 102 |
| AlwaysMatches | 103 |
| Position | Value | Position | Value | Position | Value |
|---|---|---|---|---|---|
| first in field | 1 | first in subfield | 2 | any position in field | 3 |
| Structure | Value | Structure | Value | Structure | Value |
|---|---|---|---|---|---|
| Phrase | 1 | word list | 6 | urx | 104 |
| word | 2 | date (un-normalized) | 100 | free-form-text | 105 |
| key | 3 | name (normalized) | 101 | document-text | 106 |
| year | 4 | name (un-normalized) | 102 | local number | 107 |
| date (normalized) | 5 | structure | 103 | string | 108 |
| numeric string | 109 |
| Truncation | Value | Truncation | Value | Truncation | Value |
|---|---|---|---|---|---|
| Right Truncation | 1 | Do not truncate | 100 | Glob (regExpr-1) | 102 |
| Left truncation | 2 | Process # ... | 101 | Regexp (regExpr-2) | 103 |
| Left and right | 3 |
| Completeness | Value | Completeness | Value | Completeness | Value |
|---|---|---|---|---|---|
| incomplete subfield | 1 | complete subfield | 2 | complete field | 3 |
| TAG | arg1 | arg2 | Meaning |
| PUBLIC | pubid | sysid | This specifies that sysid should be used as the effective system identifier if the public identifier is pubid. Sysid is a system identifier as defined in ISO 8879 and pubid is a public identifier as defined in ISO 8879. |
| ENTITY | name | sysid | This specifies that sysid should be used as the effective system identifier if the entity is a general entity whose name is name. |
| ENTITY | %name | sysid | This specifies that sysid should be used as the effective system identifier if the entity is a parameter entity whose name is name. Note that there is no space between the % and the name. |
| DOCTYPE | name | sysid | This specifies that sysid should be used as the effective system identifier if the entity is an entity declared in a document type declaration whose document type name is name. |
| LINKTYPE | name | sysid | This specifies that sysid should be used as the effective system identifier if the entity is an entity declared in a link type declaration whose link type name is name. |
| NOTATION | name | sysid | This specifies that sysid should be used as the effective system identifier for a notation whose name is name. This is an extension to the SGML Open format used for compatibility with SP 1.1.1.. |
| OVERRIDE | bool | bool may be YES or NO. This sets the overriding mode for entries up to the next occurrence of OVERRIDE or the end of the catalog entry file. At the beginning of a catalog entry file the overriding mode will be NO. A PUBLIC, ENTITY, DOCTYPE, LINKTYPE or NOTATION entry with an overriding mode of YES will be used whether or not the external identifier has an explicit system identifier; those with an overriding mode of NO will be ignored if external identifier has an explicit system identifier. This is an extension to the SGML Open format, and is not implemented in this version. | |
| SYSTEM |