Cheshire II CommandsZselect, Zfind, Zscan, Zdisplay, Zshow, Zset, Zclose, ZQL, Zformat, Zmakeformat, Zremoveformat, Zshowformat, Zdelete, ZSort, zhighlight, pTmpNam, pTranLog, sResults, Cheshire_Search, Cheshire_Fetch, TileBar_Search, Cheshire_Close, LCCBuild, LCCGet, LCCDestroy CHESHIRE_SEARCH_STAT_DUMP
(See also the map widget commands for cheshire2 and staffcheshire)\- Special Tcl/Tk commands for manipulating files and searching in CheshireII
This document describes the CheshireII specific command language features added to Tcl/Tk for the Cheshire II client programs cheshire2, ztcl, webcheshire, and staffcheshire.
Cheshire2 is the primary client program of a the CheshireII information retrieval system. The system incorporates a client-server architecture, X window interface and WWW support, Boolean and probabilistic retrieval methods, and a flexible scripting facility using the Tcl/Tk language. The ztcl client program is identical to the CheshireII program, except that it doesn't include the TK toolkit for X windows. The webcheshire client program includes all of the features of ztcl, with the addition of special commands access to server-side search and retrieval from Cheshire databases without need for establishing a Z39.50 connection. Webcheshire is intended for use in creating cgi-bin scripts for WWW applications using the Cheshire system. The staffcheshire client (under development) includes all of the features of the webcheshire client with the addition of commands to examine and modify system configuration files and data in the database, and the inclusion of the TK toolkit for building X window based interfaces for maintenance of the Cheshire system..
The Cheshire client programs are configured and run via Tcl/Tk scripts and the server is driven by SGML-like configuration files that describe the database files and indexes for the system. The rest of this document describes the specific Tcl/Tk commands that facilitate the use of the CheshireII database or other Z39.50 compatible databases from within Tcl/Tk. They provide an interface to the 'c' routines that make up the bulk of the system from the Tcl scripting language.
Zselect servername [hostaddress databasename portnumber] [idauthentication]
ZSELECT servername [hostaddress databasename portnumber] [idauthentication]
zselect servername [hostaddress databasename portnumber] [idauthentication]
The zselect command is used to establish a connection to a particular Z39.50 server and specify the database you would like to search (when there is a choice available). There are a number of server/database combinations included in the Cheshire clients, and for these only the servername is needed. (All of these hosts are accessible to the client programs in the global Tcl array called "Z_HOSTS" and can be displayed using the "parray Z_HOSTS" command). To connect to any other server the first four parameters must be provided (after the initial connection, subsequent connections during the same client session may use the servername alone).
servername (OPTIONAL)
This specifies a name to use for the server/database combination. When used as the only parameter the servername indicates which server to make a connection to. (NOTE: The particular database to be searched on a given server can be set using the zset database command, without having to reconnect to the server under another servername)
hostaddress (OPTIONAL)
This specifies the internet name or address of the Z39.50 server.
databasename (OPTIONAL)
The name of the database to search. This must be a valid database name for the server otherwise the search commands will fail. A common database name used by Z39.50 servers is "cat" to mean the online catalog.
portnumber (OPTIONAL)
This specifies the port on which the Z39.50 client must connect. The "well-known port" for Z39.50 is 210, but different servers may choose to use different ports.
idauthentication (OPTIONAL)
This specifies the authentication string that the particular server requires for connection. This string is passed to the server in the "idAuthentication" field of the init PDU when attempting to connect.
Zfind indexname1[ATTR] [RELOP] search_string1 [[BOOLOP | PROXOP | FUZZYOP | RESTRICTOP | MERGEOP] indexname2 [ATTR] [RELOP] search_string2 [BOOLOP2 | PROXOP2 | FUZZYOP2 | RESTRICTOP2 | MERGEOP2]... [resultsetid id_string]
ZFIND indexname1[ATTR] [RELOP] search_string1 [[BOOLOP | PROXOP | FUZZYOP | RESTRICTOP | MERGEOP] indexname2 [ATTR] [RELOP] search_string2 [BOOLOP2 | PROXOP2 | FUZZYOP2 | RESTRICTOP2 | MERGEOP2]... [resultsetid id_string]
zfind indexname1[ATTR] [RELOP] search_string1 [[BOOLOP |
PROXOP | FUZZYOP | RESTRICTOP | MERGEOP]
indexname2 [ATTR] [RELOP] search_string2 [BOOLOP2 | PROXOP2 | FUZZYOP2
| RESTRICTOP2 | MERGEOP2]... [resultsetid
id_string]
zfind id_string:[itemid,itemid,...itemid-itemid[find,regular_expression]
This command will search the current database specified on the current host/server established by the zselect command.
resultsetid id_string (OPTIONAL)
This specifies a server-side set name (id_string) which will be used for the storing the results of the current search. This will not work unless the server supports named result sets.
indexname
This can be any name in the BIB1, GILS, GEO or EXP1 attribute sets. In addition there are many aliases assigned for various names. The list of supported index names is stored in the global Tcl array "ALL_INDEXES" and can be displayed using the "parray ALL_INDEXES" command (note that the attribute set USE value for each of these is the third number shown in the list of numbers included in the display, the following numbers are the other attributes: Relation, Position, Structure, Truncation, Completeness, in order. The first and second numbers are an internal id for the attribute, and a code indicating which attribute sets this item is used for). Here are some entries from ALL_INDEXES:
ALL_INDEXES(ABSTRACT) = ABSTRACT Abstract {116 17 62 0 3 6 0 1}
ALL_INDEXES(ANY) = ANY Any {148 17 1016 0 3 6 0 1}
ALL_INDEXES(ANYWHERE) = ANYWHERE Anywhere {176 17 1035 0 3 6 0 1}
ALL_INDEXES(AUTHOR) = AUTHOR Author_Personal_name {0 17 1 0 3 6 0 1}
ALL_INDEXES(AUTHOR-NAME_CONFERENCE_KEY) = AUTHOR-NAME_CONFERENCE_KEY
Author-name_conference
{130 17 1006 0 3 1 0 1}
ALL_INDEXES(AUTHOR-NAME_CORPORATION_KEY) = AUTHOR-NAME_CORPORATION_KEY
Author-name_corporation {127 17 1005 0 3 1 0 1}
ALL_INDEXES(AUTHOR-NAME_PERSONAL_KEY) = AUTHOR-NAME_PERSONAL_KEY
Author-name_personal
{126 17 1004 0 3 1 0 1}
[ATTR]
A set of attributes and values to be included with the query
This is a list of Z39.50 attribute number and their values, contained in square brackets and (optionally) separated by commas, These are the numerical values for the attributes from the Z39.50 standard (or attribute set definition) and these will override existing attributes associated with index names except for the USE attribute -- although use attributes may be specified this way, both the use attribute from the specified index name and the one specified in square brackets will be sent to the server. The attributes are expressed in the form "n=m" where "n" is the attribute type number and "m" is the value. For example "[1=55 2=1 3=1 4=3]" would indicate a USE(1) attribute of 55(CODE-Geographic area in BIB-1), a RELATION(1) attribute of 1(less than), a POSITION(3) attribute of 1(first in field), and a STRUCTURE(4) attribute of 3(key). NOTE: In Tcl square brackets indicate a command to be executed, so to use them in the Tcl clients you will need use a backslash before the open and close square bracket.
To combine multiple attribute set in
the same query, you may specify the attributeset OID or
Symolic names for the following Attribute sets: "BIB-1", "EXP-1",
"EXT-1", "CCL-1", "GILS", "STAS", "COLLECTIONS-1", "CIMI-1", "GEO-1",
"ZBIG", "UTIL" ;, "XD-1", "ZTHES", "FIN-1", "DAN-1" and "HOLDINGS").
Either these symbolic names (and many variants) or the OIDs may be
specified (OIDs of unlisted attribute sets can also be specified).
These are specified before the attribute type and followed by a space.
The attributeset will only
apply to the immediately following attribute specification (by default
the current ATTRIBUTESETNAME set by the ZSET command is used to
interpret the attributes). For example...
Note also that this form of specifying attributes can be used in
place
of an explicit index name, for example...
RELOP
The relational operation to be performed. These are:
| Value |
Name |
Semantics |
| blank = |
Equals | Searches for equal values to the search string. |
| <= LE .LE. |
Less than or Equal | Search for values less than or equal to the search string. |
| < LT .LT. |
Less Than | Search for values less than the search string. |
| > GT .GT. |
Greater Than | Search for values greater than the search string. |
| >= GE .GE. |
Greater Than or Equal | Search for values greater than or equal to the search string. |
| <> != NE .NE. |
Not Equal | Search for values NOT equal to the search string. |
| ?? PHON .PHON. |
Phonetic | Search for values phonetically similar to the search string. |
| % STEM .STEM. |
Stem | Search for stemmed values equal to the stemmed search string. |
| @ REL .REL. |
Relevant | Search for items relevant to the search string (Uses the Berkeley TREC-3 Algorithm |
| <=> WITHIN .WITHIN. |
Within | Search for items within a range (e.g.: used for dates indicated by year-year) (BIB-1 extension) |
GEO Profile RELOPs
| Value |
Name |
Semantics |
| >=< .OVERLAPS. [GEO 2=7] |
Overlaps | The access point region has a geometric area in common with the search term region. Given a search term region of S and access point region of T, the following algebra expresses the conditions required: {S(North) >= T(South)} and {S(South) <= T(North)} and {S(East) >= T(West)} and {S(West) <= T(East)}. |
| @>=< .OVERLAPS_RANK. [GEO 2=707] |
Overlaps (ranked) | The access point region has a geometric area in common with the search term region. Given a search term region of S and access point region of T, the following algebra expresses the conditions required: {S(North) >= T(South)} and {S(South) <= T(North)} and {S(East) >= T(West)} and {S(West) <= T(East)}. Rank the resulting matches by amount of overlap and relative size of target and search areas. |
| >#< .FULLY_ENCLOSED_WITHIN. [GEO 2=8] |
Fully Enclosed Within | The access point region is fully enclosed within the search term region. |
| @>#< .FULLY_ENCLOSED_WITHIN_RANK. [GEO 2=708] |
Fully Enclosed Within (ranked) | The access point region is fully enclosed within the search term region. Rank the resulting matches by amount of overlap and relative size of target and search areas. |
| <#> .ENCLOSES. [GEO 2=9] |
Encloses | The access point region fully encloses the search term region. |
| @<#> .ENCLOSES_RANK. [GEO 2=709] |
Encloses (ranked) | The access point region fully encloses the search term region. Rank the resulting matches by amount of overlap and relative size of target and search areas. |
| <># .OUTSIDE_OF. [GEO 2=10] |
Fully Outside Of | The access point region has no geometric area in common with the search term region. |
| +-+ .NEAR. [GEO 2=11] |
Near | The access point region falls within a default distance of the search term region. The default distance is defined by the server. |
| .#. .MEMBERS_CONTAIN. [GEO 2=12] |
Members Contain | The access point element or one of its subordinate elements is equal to the search term value (subject to possible qualification by the Truncation and Structure Attributes). (Not currently available on Cheshire servers) |
| !.#. .MEMBERS_NOT_CONTAIN. [GEO 2=13] |
Members Not Contain | The access point element and all of its subordinate elements are not equal to the search term value (subject to possible qualification by the Truncation and Structure Attributes). (Not currently available on Cheshire servers) |
| :<: .BEFORE. [GEO 2=14] |
Before | The access point date (or date range) is before the search term date (or date range). |
| :<=: .BEFORE_OR_DURING. [GEO 2=15] |
Before or During | The access point date (or date range) is before or during the search term date (or date range). |
| :=: .DURING. [GEO 2=16] |
During | The access point date (or date range) is during the search term date (or date range). (Same as WITHIN above on Cheshire) |
| :>=: .DURING_OR_AFTER. [GEO 2=17] |
During or After | The access point date (or date range) is during or after the search term date (or date range). |
|
:>: |
After | The access point date (or date range) is after the search term date (or date range). |
Note that most systems do not support ??, %, @, or 'within' and many support only equal. The Cheshire server (or webcheshire) supports @ to indicate a probabilistic ranked search should be performed. The Cheshire server (or webcheshire) also supports % to indicate that a non-probabilistic ranking of the results should be made. Basically, on Cheshire servers, % ranks based on the number of terms in common between the query and the document with a minumum (set to half of the terms -- or 1 if there is only a single query term). It was originally set up for use with image retrieval. In a lot of simple cases it will do things in a similar way to probabilistic ranking, in more complex cases it is more similar to Boolean, but allows limited partial matching. Four additional ranking operators for Cheshire are shown in the table below.
| Value |
Name |
Semantics |
| @@ .TREC2. [2=510] |
Berkeley TREC-2 | (Cheshire Only) Use the Berkeley TREC-2 Algorithm for results ranking |
| @* .TREC2FBK. [2=510] |
Berkeley TREC-2 with Blind Feedback | (Cheshire Only) Use the Berkeley TREC-2 Algorithm for results ranking with blind relevance feedback (NOTE: works only with VECTOR indexes -- see index_vectors) |
| @+ .OKAPI. [2=500] |
Okapi BM-25 | (Cheshire Only) Use the Okapi BM-25 Algorithm for results ranking |
| @/ .TFIDF. [2=530] |
TFIDF | (Cheshire Only) Use the Vector Space TFIDF Algorithm for results ranking (NOTE: works only with VECTOR indexes) |
| @& .LUCENE. [2=540] |
TFIDF_LUCENE | (Cheshire Only) Use the Lucene version of the Vector Space TFIDF Algorithm for results ranking (NOTE: works only with VECTOR indexes) |
| @# .CORI. [2=501] |
CORI | (Cheshire Only) Use the CORI Algorithm for results ranking -- primarily intended for distributed search |
BOOLOP
The Boolean operator to apply between results from different indexes. these are:
AND | .AND. | && : Boolean AND
OR | .OR. | || : Boolean OR
NOT | .NOT. | ANDNOT | .ANDNOT. | !! : Boolean NOT
Note that parentheses may be used to group Boolean sub-expressions, for example:
zfind title gone and (title wind or title fishing)
search_string
The term(s) to locate in the index. May include a truncation symbol (#). NOTE: that indexes defined a exactkeys will default to implicit right-hand truncation for matching - to overide this you will need to specify "[5=100]" following the index name, this is the Z39.50 DO NOT TRUNCATE attribute setting. If the index being searched supports proximity (defined in the database configuration file, then phrases to be searched within same index can be indicated by surround the phrase with dollar signs, e.g:
zfind title \$gone with the wind\$
PROXOP
The proximity operators to apply between results from within the same indexes. these are:
!PROX | !ADJ | !NEAR | !FAR : Proximity operators -- not ordered
!OPROX | !OADJ | !BEFORE | !ONEAR | !OFAR : Proximity operators -- ordered
The "O" versions of the operators require the search items to appear in the order specified, while the non-O version do not. In the !PROX, !ADJ, and !NEAR versions the search items must be within the specified or default distance. In the !FAR version the search items must be farther apart than the specified or default distance. The default distance for !PROX, !OPROX, !ADJ, !OADJ and !BEFORE is 2, for !NEAR !ONEAR, !FAR and !OFAR is 20.
Each proximity operator may be modified by the type of element to be used for determining proximity. They must be appended to the operator (!OPER/ELEMENT).These are:
/C | /CHAR : Characters.
/W | /WORD : Words (the default).
/S | /SENT | /SENTENCE : Sentences.
/P | /PARA | /PARAGRAPH : Paragraphs.
/SECTION : Sections.
/CHAPTER : Chapters.
/DOCUMENT : Documents.
/ELEMENT : Elements.
/SUBELEMENT : Subelements.
/ELEMENTTYPE : Elementtypes
/BYTE : Bytes.
Each proximity operator can also be modified by a distance to override the default distances indicated above. The takes the form of a slash followed by a integer number appended to the operator.
Examples of queries using these operators are:
ZFIND TI cat !PROX/3 TI hat
Find the title word "cat" within three words of the word
"hat".
ZFIND ANYWHERE information !ADJ/WORD/3 ANYWHERE retrieval
Find the word information within three words of the word retrieval
(assuming the anywhere index includes all of the document)
zfind anywhere information !NEAR/SUBELEMENT anywhere retrieval
Find the word retrieval within 20 subelements of the place where
the
word information is found.
As a shorthand for exact phrase matching using a proximity index on a Cheshire server, the phrase can be enclosed in dollar signs. For example:
zfind TI {$Gone with the Wind$}
Assuming TI is proximity index on the Cheshire server, then this query would find the query phrase with the words in the correct order (stopwords would be ignored in the matching). Note that this works for cheshire servers ONLY and will NOT work on other servers (they will just receive the query with dollar signs in it).
Note that not all servers (including Cheshire) support all of the elements and many (or most) do not include proximity searching at all. If a server doesn't support proximity searching an error message with be returned as the result of the search.
ZUZZYOP
A Fuzzy Boolean operator to apply between results from different indexes. these are:
!FUZZY_AND : Fuzzy AND
!FUZZY_OR : Fuzzy OR
!FUZZY_NOT: Fuzzy AND NOT
Fuzzy operators are versions of the Boolean operators that are less "strict" than the conventional Boolean operators, applied to weighted result lists. In place of Boolean AND, the "!FUZZY_AND" operator takes the smallest of the two weights in the result sets for the same record. The "!FUZZY_OR" takes the largest of the two weights for the same record. "!FUZZY_NOT" currently behaves the same way as strict Boolean "NOT". Otherwise these operators are used the same way as the strict Boolean operators.RESTRICTOP
A restriction operation to apply between results these are:
!RESTRICT_FROM : See below
!RESTRICT_TO : See below
The "!RESTRICT_TO" and "!RESTRICT_FROM" operators take either a
component result and a document result, or two component results (where
one component contains the other). In the case of component and
document results the component list is restricted to components that
are in the document result -- the matching components only are
returned retaining their weight from the original component
result. When two nested component results are used with these
operators the result is larger components that include one or more of
the smaller components. (Note that with component and document results
!RESTRICT_TO and !RESTRICT_FROM may be used interchangibly and the type
of operation to be performed is determined by the nature of the
resultsets, but with two component results Parent and Child, the
following
order should be followed...
Parent !RESTRICT_FROM Child
Child !RESTRICT_TO Parent
Naturally Parent and Child can be any sub-query that result in the appropriate kind of component.
MERGEOP
Ranking score merger operations to apply between results from different indexes. These are based on "data fusion" methods, and operate on pairs of intermediate results returned from searches (or results of other mergers). The merge operators are:
!MERGE_SUM : SUM of Scores
!MERGE_MEAN : Mean Scores
!MERGE_NORM : Normalized Mean Scores
!MERGE_NSUM: SUM of
Normalized Scores
!MERGE_NPRV: Normalize and
Sum Scores with enhanced AND matches
!MERGE_CMBZ: Augmented
Normalized Scores for high-ranked documents and AND matches
!MERGE_PIVOT: Pivoted
Normalization
The !MERGE_SUM operator combines the two resultsets (like a Boolean OR) but adds the weights (actually the resulting raw ranking adds 1 + a probabilistic result and 1.5 for boolean results with matching document or component ids in both lists, and the original value for items found only in a single result.)
The !MERGE_MEAN operator combines the two resultsets (like a Boolean OR) but takes the mean of the weights from items in both lists and half of the weight of items in only a single list.
The !MERGE_NORM operator combines the two resultsets (like a Boolean
OR) but takes the MEAN (or average) of the MIN_MAX normalized weights
from items in both lists and half of the MIN_MAX normalized weight of
items in only a single list. MIN_MAX normalization scales all of
the weights in a resultset based on the maximum and minimum weights
in the resultset. The resulting weights lie in the range from 0 to 1.
This
is particularly useful when one partial resultset uses a different
ranking
algorithm from the other (such as merging normal probabilistic and
Okapi BM-25 results). This is the (currently) recommended
operator for merging probabilistic resultsets.
The !MERGE_NSUM operator normalizes the scores and takes the SUM of
the normalized scores.
The !MERGE_CMBZ operator, like the previous one, normalizes the
scores and takes the sum of normalized scores,
and doubles the total for items in both of the input resultsets
The !MERGE_NPRV operator normalizes the scores and takes the doubled
SUM of scores for items in both resultsets it then further
differentiates the results from only a single list, retaining original
scores for items that occur
in the top 100 of a ranked list (or the top half if less than one), and
halves all remaining scores.
The !MERGE_PIVOT operator returns the adjusted scores for items in
the left-hand resultset based on the scores for corresponding items
(based on internal document id) in the right-hand resultset. The
original purpose was to adjust weights for document components based on
weights for the entire document. It turns out that this method is also
beneficial in other situations (such as weighting one index result
versus another index result). The basic formula
used is:
final_score = (pivot_val * right_hand_score) + ((1-pivot_val) *
left_hand_score);
The default value for the pivot_val is 0.70. This can be changed by
constructing the operator like...
index1 XXX !MERGE_PIVOT/90 index2 YYY
would set the pivot_val to 0.90 (the number following the slash should
be less than 100, since it is divided by
100 to set the pivot coefficient.
RELEVANCE
FEEDBACK
When searching cheshire servers you can perform simple relevance
feedback based on the first index entry in the cheshire
configuration file that contain the index mapping for "relevance" (i.e.
RELAT tag with value 102). If the database you are
searching is set up this way then you can use a special form of the
resultsetid to indicate that relevance feedback is to
be performed using that index. (Yes... the setup is a bit convoluted,
but it
also permits relevance feedback commands to be
transmitted over Z39.50).
For the client side things are fairly simple, just use the
resultsetid with the item numbers of the seen documents (this is order
based, so the first item is a resultset is 1, the second 2, etc.).
Assume that a search like "zfind topic xxx resultsetid newresult"
has been performed giving a resultset with 20 items. To indicate that
you would like to do relevance feedback on items number
1, 2 and 15 in that result set you simply submit the query:
zfind newresultset:1,2,15
and those items will be used as the basis of the relevance feedback
search. Ranges can be indicated using a hyphen, for example
to indicate items 3 through 5 and item 10 you would use:
zfind newresult:3-5,10
Any combination of individual items and ranges separated by commas
may be used. The relevance feedback method used is a fairly simple one
that takes the terms occurring in particular index elements (with the
RELAT 102 tag) in the associated configuration file for each of the
documents selected and
PATTERN MATCH
RESTRICTIONS
Cheshire servers now also support a method for restricting
resultsets based on regular expression matching on the entire record.
In order to use this featurea resultset from a conventional search is
needed, and the Z39.50 syntax is based on the syntax for relevance
feedback. (The following discussion addresses the Z39.50 version first,
and then the similar feature for the webcheshire client.)
If you have done a search and the resultset is named "default", you can
request a search for the word "stuff" in records 2 and 5 of that
resultset by doing a search like:
zfind default:2,5,find,stuff
Note that in this simple form, each word to be searched must be
separated by commas, and the word "find" must appear first after the
list of numbers representing the records (in resultset sequence) to be
searched. Matching is NOT case sensitive. Ranges of resultset numbers
can also be used. For example...
zfind default:2,5,10-30,find,stuff
would do the search for "stuff" in records 2, 5 and 10 through 30. For
more complex searches full regular expression matching is available,
but the entire resultset string must be enclosed in double quotes,
single quotes, or braces ({}), for example to search for cat or dog
anywhere in record 5 you could use:
zfind {default:5,find,(cat)|(dog)}
to find the exact string "cat and dog" you could use:
zfind {default:5,find,(cat and dog)}
or simply:
zfind {default:5,find,cat and dog}
To match any of a set of regular expressions, just separate each by
commas, for example:
zfind default:2,5,find,stuff,blotz,zap
would search for words "stuff", "blotz", or "zap" in resultset
records 2 and 5. This could also be expressed as:
zfind {default:2,5,find,(stuff)|(blotz)|(zap)}
Note that these searches are not exactly the same -- the first version
searches for the three words surrounded by non-alphabetic strings,
including blank, newlines, punctuation, etc. and the regular expression
searches for the strings regardless of any surrounding characters.
Simple words like "stuff" in the first example are treated internally
as the regular expression:
(^|[^a-zA-Z])(stuff)([^a-zA-Z]|$)
Also note that numbers are considered word separators by this as well.
Remember that using "eval" in client TCL processing may strip a layer
of quotes or braces from search strings before they actually reach the
search parser, so if you are getting syntax errors, you might need to
double the braces or quotes. All of the above searches return a
reduced resultset with only those records that are both included in the
list of records, and that match the regular expression(s). Note that
this form of resultset name can be used anyplace in a query that
a simple resultset name is used, so
zfind title gone with the wind AND {res1:1,(frankly my dear)}
would do a boolean "AND" between the resultsets returned by the title
search and the resultset returned with regular expression matching on
record 1 of results "res1". Note that "res1" must exist before the
query is submitted or else an error will occur. Also note the because
of the parsing method used, the regular expression may not include
embedded commas, but only commas to separate the list items.
The webcheshire client has a similar feature, but only a single regular
expression pattern is matched for ALL of the items resulting
from a normal search. The regular expression is simply set in a
variable called "CHESHIRE_REGEX_FILTER" and that regular expression is
applied to all of the results of subsequent searches until the variable
is "unset" or set to a different value or the null string (""). The
following example shows the usual sequence...
set CHESHIRE_DATABASE bibfile
set CHESHIRE_CONFIGFILE "testconfig.new"
set CHESHIRE_RECSYNTAX SGML
set CHESHIRE_NUM_START 1
set CHESHIRE_NUMREQUESTED 5
set query "search topic geometry"
set CHESHIRE_REGEX_FILTER {(mathematical statistics)}
set results [eval $query]
The returned "results" are limited to those that match both the main
query and the regular expression in CHESHIRE_REGEX_FILTER. Note that
the single term matching mode in the Z39.50 version is NOT used, so if
you want to match a term like "stuff" surrounded by blanks, etc. you
should use a regular expression that does that, such as:
set CHESHIRE_REGEX_FILTER {(^|[^a-zA-Z])(stuff)([^a-zA-Z]|$)}
Note also that because these filtering operations take place on the raw
SGML/XML records, it is possible to include structural elements of the
records in the regular expressions for either the webcheshire or Z39.50
forms. For example, the following can match "geometry" in the "Fld650"
tags in the above tcl script...
set CHESHIRE_REGEX_FILTER
{(<Fld650)(.*)(geometry)(.*)(</Fld650>)}
(Note also that if this the sort of search you want to do on a
regular basis, it would be better and faster to just create an index
for that tag).
Zscan indexname[ATTR] search_string stepsize numreq position
ZSCAN indexname[ATTR] search_string stepsize numreq position
zscan indexname[ATTR] search_string stepsize numreq position
This command will retrieve a set of index terms for the specified index name from the current database and the current host/server, as established by the zselect command. For the webcheshire client, the command name "lscan" or "cheshire_scan" is used with the same arguments.
indexname
As with ZFIND this can be any name in the BIB1, GILS, GEO or EXP1 attribute sets. In addition there are many aliases assigned for various names. The list of supported index names is stored in the global Tcl array "ALL_INDEXES" and can be displayed using the "parray ALL_INDEXES" command (note that the attribute set USE value for each of these is the third number shown in the list of numbers included in the display, the following numbers are the other attributes: Relation, Position, Structure, Truncation, Completeness, in order. The first and second numbers are an internal id for the attribute, and a code indicating which attribute sets this item is used for).
search_string
This term specifies the place in the index to scan. It should be enclosed in quotes or curly braces if there is more than a single word in the string
stepsize
This is the number of terms to skip between each of the terms returned. This allows refining of scans from a wide scan down to a more finely grained one. To return each term in the index, use 0.
Numreq
This is the number of term to be returned from the index.
position
This is position in the list of returned terms where the search_string will be located (if it exists in the index).
Zdisplay [resultsetid] [number_of_records]
[start_records_num]
ZDISPLAY [resultsetid] [number_of_records]
[start_records_num]
zdisplay [resultsetid] [number_of_records]
[start_records_num]
Display number_of_records records resulting from a search having the specified resultsetid, or the last resultsetid used if not specified. If a start_record_num is supplied then display the records from the indicated ordinal position in the result, otherwise just continue from last record displayed. The number_of_records value defaults to the NumberOfRecordsRequested zset value. When number_of_records is supplied it resets the NumberOfRecordsRequested value to that number for the current connection.
resultsetid (OPTIONAL)
This specifies the server result set name to use when displaying a record. The set name will be used in subsequent display requests until changed or until the set is deleted on the server
start_records_num (OPTIONAL)
The ordinal number of the position in the result.
number_of_records (OPTIONAL)
The number of records that the client will try to retrieve from the result set.
Zshow parameter_name | ALL | SEARCH | PRESENT | SERVER |
CLIENT
ZSHOW parameter_name | ALL | SEARCH | PRESENT | SERVER |
CLIENT
zshow parameter_name | ALL | SEARCH | PRESENT | SERVER |
CLIENT
This command returns a string containing information about the current session and connection.
ALL (OPTIONAL)
Show all of the values for all parameters.
SEARCH (OPTIONAL)
Show all of the values for all search-related parameters currently set.
PRESENT (OPTIONAL)
Show all of the values for all present(display)-related parameters.
SERVER (OPTIONAL)
Show all of the values for all server parameters (when connected)
CLIENT (OPTIONAL)
Show all of the values for all client-related parameters.
parameter name (OPTIONAL)
Show the value(s) of a particular parameter item. The following items can be shown:
sDBNames | database : Show current database name.
hits | resultcount | numhits : Show hits from last search.
records | numrecords | numrecords : Show number of records retrieved in latest search or present.
nextrec | nextResultSetPosition | nextrecordpos : Show ordinal position of next record to be retrieved by a present.
totalrecs | totalrecords | totalNumberRecordsReturned : Total number of record retrieved from the current search.
Hosts | Servers : Show the current connections for the seasion and whether or not the connection is active
Host | Servername : Show the machine name or IP address of the currently active server.
Port : Show the currently active port number for the current server.
StatsFile | LogFile : Show the name of the current transaction log file.
QueryFormat | QueryType : show the type code for the current query format.
PreferredMessageSize : Show the preferred message size.
iMaxRecSize | exceptionalRecordSize : Show the maximum record size permitted.
sSmallSetUpperBound | SmallSetUpperBound : Show the Z39.50 Search SmallSetUpperBound parameter
sLargeSetLowerBound | LargeSetLowerBound : Show the Z39.50 Search LargeSetLowerBound parameter
sMediumSetPresentNum | MediumSetPresentNumber : Show the Z39.50 Search MediumSetPresentNumber parameter
ReplaceIndicator : Show whether or not resultsets can be replaced on reuse of a resultsetname.
ResultSetName | pResultSetid | sResultSetName : Show the current resultsetname.
sSmallSetElementSetNames : (this won't work right now) sMediumSetElementSetNames : (this won't work right now)
PreferredRecordSyntax : Show the requested record syntax
Query : Show the latest query AttributeSet : Show the attribute set used for the latest query. pResultSetStartPoint | nextResultSetPosition : Show the Z39.50 Search NextResultSetPosition parameter.
pNumRecsReq | numberOfRecordsRequested : Show the Z39.50 Present NumberOfRecordsRequested.
pElementSetNames | ElementSetNames : Show the Z39.50 Present Elementsetnames
Zset ParameterName {value}
ZSET ParameterName {value}
zset ParameterName {value}
Set parameters that affect the Z39.50 connection, search, and present/display. The parameters that may be set are described below:
session {connectionIDnumber} : Switch to another connection currently active. (Multiple zselects can be used to start multiple simultaneous Z39.50 connections and this command can be used to switch between them.
ResultSetName | pResultSetid | sResultSetName {resultsetname} : Sets the
ResultSetName | sResultSetName | pResultSetid {resultsetname} : Set the resultset name to be used in search (zfind) and present (zdisplay) commands.
Database | Databasenames | Databasename {databasename} : Sets the name of the database to be searched in subsequent zfind commands.
ElementSet | ElementSetNames {name} : Set the elementsetname
to be used in
records in zdisplay commands (default is F for Full records).
For
Cheshire Servers it is possible to set two special elementset name
values
"XML_ELEMENT_..." and "STRING_SEGMENT_...".
The first form includes an XPATH (subset) specification of the XML/SGML
tags to extract from a record -- this may include limited regular
expressions.
see the section below on XML_ELEMENT specifications.
The second form includes a specification of a substring or a single
SGML/XML tag and only that segment of the record will be returned, see
the section below on STRING_SEGMENT specifications
PreferredRecordSyntax | pPreferredRecordSyntax | sPreferredRecordSyntax | RecordSyntax | pRecordSyntax | RecordSyntax | RecordFormat | RecFormat sRecordSyntax | RecSyntax {syntaxname} : Set the (suggested) record syntax for the items to be returned from the server. Acceptable recsyntaxes are MARC, SUTRS, SGML, XML, OPAC, EXPLAIN, SUMMARY, GSR0, GENERIC, or ES.
AttributeSet | Attributes {name or OID} : Set the attribute set to be used in searching. Acceptable attribute set names (mnemonics) are BIB1, EXPLAIN, EXT (extended services), CCL (common command Language, GILS (goverment information Locator Service), or STAS (Scientific and Technical attribute Set). Can use attribute set OIDs in place of names. Host {servername} : Can be used the same way as zselect to create a new connection to a host/
QueryFormat {format} : Can be used to set the query type to be processed. The format can be "RPN" or "1", "CCL" or "100", "ISO8777" or "2", "ERPN" or "101", "RANKED" or "102", "SQL" or "0", the default is RPN.
PreferredMessageSize | ipreferredMsgSize | ipreferredMessageSize | preferredMsgSize {size} : Set the preferred message size parameter for an INIT.
exceptionalRecordSize | imaxRecSize | maxRecSize | maxRecordSize {size} : Sets the maximum record size parameter for the INIT.
sSmallSetUpperBound | SmallSetUpperBound {number} : Sets the Z39.50 Search SmallSetUpperBound parameter
sLargeSetLowerBound | LargeSetLowerBound {number} : Sets the Z39.50 Search LargeSetLowerBound parameter
sMediumSetPresentNum | MediumSetPresentNumber {number} : Sets the Z39.50 Search MediumSetPresentNumber parameter
ReplaceIndicator | sReplaceIndicator {0 or 1} : Sets whether or not resultsets can be replaced on reuse of a resultsetname.
ResultSetStartPoint | StartPosition {number} : Sets position for the next item to be retrieved by a zdisplay (it is overridden by any parameters supplied to zdisplay)
NumrecsRequested | NumRequested | NumRecs | NumberOfRecordsRequested {number} : Sets the number of records to be retrieved by the next zdisplay (overridden by parameters to the zdisplay command)
pElementSetNames | ElementSetNames {elementsetname} : Sets elementset name to be used in the next present.
logging | log | logs {(on | 1) | (off | 0)} : Turns logging of transactions on or off.
Zclose
ZCLOSE
zclose
Close the currently active connection to a Z39.50 server.
Zql sql_select_statement
ZQL sql_select_statement
zql sql_select_statement
This command submits the SQL statement provided as the argument(s) to the current server as a Z39.50 Type 0 query. This assumes that the server accepts type 0 queries and processes them against the currently connected database with server-side SQL parsing. The Cheshire server can handle such queries for a RDBMS type configuration file entry (by passing things through to the RDBMS itself). For the webcheshire client the "SQL" or "LSQL" verb followed by an SQL statement will function in the same way for searching local DBMS databases. Note also that in the local webcheshire "SQL" version ANY SQL statement may follow the command verb. This means that the underlying relational databases may be created, and modified as well as queried using this commands (assuming permissions on the DBMS itself permit these operations). The ZQL command proper, however is only set up to permit SELECT operations and not database modification.
Zformat formatname record rectype [recnumber]
[max_line_length]
[DTD_filename]
ZFORMAT formatname record rectype [recnumber]
[max_line_length]
[DTD_filename]
zformat formatname record rectype [recnumber]
[max_line_length]
[DTD_filename]
This command is used to provide special formatting on the client side for MARC and some SGML records.
formatname:
The format names for MARC are:
"FULL" or "LONG" or"TAGGED" for a full records
with tagged fields,
"BRIEF" or "SHORT" for an abbreviated record with tagged
fields,
"MARC" or "FULLMARC" for a full MARC records tagged
using MARC field numbers,
"REVIEW" or "EVAL" for very short records,
"LIST" or "TCLLIST" for records structured as a TCL
list.,
"HTML" for full records tagged as HTML,
"SHORTHTML" abbreviated records tagged as HTML,
"REVIEWHTML" for very short records tagged as HTML.
The format names for SGML are:
For items using the USMARC DTD
"REVIEW" for very short records,
"SHORT" for an abbreviated record with tagged fields,
"LONG" for full records with tagged fields,
"MARC" for a full MARC records tagged using MARC field numbers,
"HTMLREVIEW" for very short records tagged as HTML,
"HTMLSHORT" abbreviated records tagged as HTML,
"HTMLLONG" for full records tagged as HTML
"CSMP_HTMLREVIEW" for very short records tagged as HTML, with
860 URL fields set up as hyperlinks.
"CSMP_HTMLSHORT" abbreviated records tagged as HTML, with 860
URL fields set up as hyperlinks.
"CSMP_HTMLLONG" for full records tagged as HTML with 860 URL
fields set up as hyperlinks.
"GLAS_HTMLREVIEW" for very short records tagged as HTML (with
special tags),
"GLAS_HTMLSHORT" abbreviated records tagged as HTML (with special
tags),
"GLAS_HTMLLONG" for full records tagged as HTML.
For items using the standard "classcluster" DTD (lccclust) there is the "LCCSHORT" format.
For items using the TREC FT DTD there are three formats, "REVIEW", "SHORT", and "LONG".
record: The full record to be formatted
rectype: This is the type of record it can be expressed as a name or a type number. These are: "marc" or "usmarc" or 1,"sgmlmarc" or 2,"sgml" or 5,"opac" or 3,"text" or "sutrs" or 4,"generic" or 6,"explain" or 7.
recnumber: This is the sequence or id number to use for the formatted record.
max_line_length: This is maximum number of characters to permit on a line.
DTD_filename: For SGML (rectype 5) this is the name of the file containing the DTD for this record type.
ZMakeFormat formatname [DTDNAME]
{{list_of_format_elements}}
ZMAKEFORMAT formatname [DTDNAME] {{list_of_format_elements}}
zmakeformat formatname [DTDNAME] {{list_of_format_elements}}
This command adds a new format type to the builtin set of formats (see ZFORMAT above). The DTDNAME parameter is required for SGML formats. The list of format elements is a list of tcl lists that has the following structure:
{{label[500]} {tags[200]} {subfields[200]} {beginpunct[200]} {subfsep[200]} {endpunct[200]} {newfield [TRUE|FALSE|-1]} {print_all [TRUE|FALSE]} {print_indicators [TRUE|FALSE]} {print_delimiters[TRUE|FALSE]} {repeatlabel[TRUE|FALSE]} {multisubstitute[TRUE|FALSE]} {indent[NUMBER]}}
The number in square brackets is the maximum size for the element, or the permitted values for the element. These elements are:
label: The label to place before the field in the formatted record.
tags: Which MARC or SGML tags this format element applies to (may use ?as a a single character wildcard for MARC). If "#" is used instead of a tag, then the record number supplied as a parameter to zformat will be used as the item to be formatted.
subfields: Which subfields of the tags the element applies to ( For MARC, empty means all subfields, otherwise only the listed subfields are formatted) (For SGML, all subfields is indicated by "*" , and subfield names must be separated by spaces)
beginpunct: String to place before the field in the formatted record.
subfsep: String to place between subfields in the formatted record.
endpunct: String to place after the field in the formatted record.
newfield: Is this a new field (should always be true except for end of format records where it should be -1). (Ignored in SGML)
print_all: Print the entire field? (minimal formatting).(Ignored in SGML)
print_indicators: Print the MARC indicators before the field? (Ignored in SGML)
print_delimiters: Print the MARC delimiters? (Usually FALSE). (Ignored in SGML)
repeatlabel: Repeat the label string for each line or matching field? (Ignored in SGML)
multisubstitute: Substitute the field for '%' in the beginpunct string? (Ignored in SGML)
indent: Number of spaces to indent wrapped fields. (Ignored in SGML)
Here is an example for a MARC format:
zmakeformat SHORT {{{Record #} {#} {} {} { } {
} TRUE FALSE FALSE FALSE FALSE FALSE 0} {{Author:} {1??} {} {} { } {.
} TRUE FALSE FALSE FALSE FALSE FALSE 15} {{Title:} {245} {} {} { } {.
} TRUE FALSE FALSE FALSE FALSE FALSE 15} {{Publisher:} {260} {ab} {} { } {.
} TRUE FALSE FALSE FALSE FALSE FALSE 15} {{Date:} {260} {c} {} { } {.
} TRUE FALSE FALSE FALSE FALSE FALSE 15} {{Periodical:} {773} {} {} {} {.
} TRUE FALSE FALSE FALSE FALSE FALSE 15} {{Subjects:} {6[59]0} {} {} { -- } {.
} TRUE FALSE FALSE FALSE FALSE FALSE 15} {{LC Call No.:} {050} {} {} { } {
} TRUE FALSE FALSE FALSE FALSE FALSE 15}}
Which would produce a formatted MARC record that looks like this:
Record #1
Author: Hatcher, William S.
Title: The logical foundations of mathematics / by William S. Hatcher.
Publisher: Oxford ; New York : Pergamon Press.
Date: 1968.
Subjects: Mathematics -- Philosophy.
ZRemoveFormat formatname
ZREMOVEFORMAT formatname
zremoveformat formatname
This command removes a format created with the zmakeformat command described above.
ZShowFormat formatname
ZSHOWFORMAT formatname
zshowformat formatname
This command returns a format element list for builtin formats or formats created with the zmakeformat command described above.
Zdelete ALL | ResultSetName1 [ResultSetName2 ...]
ZDELETE ALL | ResultSetName1 [ResultSetName2 ...]
zdelete ALL | ResultSetName1 [ResultSetName2 ...]
Deletes
named result set(s) stored on the Z39.50 server. The keyword "ALL"
deletes all stored result sets.
pTmpNam directoryname
Creates a unique random file name for temporary file usage.
ZSort
ZSORT
zsort
local_sort
LS
cheshire_sort
Sort result sets. This command will sort results, the "Z" forms of the command create a Z39.50 Sort request and send it to the target, the other forms are used in webcheshire for local sorting of resultsets. The parameters and flags discussed below apply to both versions.
The additional arguments to the ZSort command are:
/para/sentence{@runon}
would select as a sort key the runon attributes of the sentence tags that are descendents of para elements. The following:
/para/sentence/@runon
would do exactly the same selection. The values of attributes may be used as a criteria for key selection as well. For example:
/para/sentence{@runon="not"}
would select all sentence elements where the attribute runon had the value "not" -- in this case the element contents instead of the attribute values are used for the sort keys.
zsort -attr title -attr author -missing_value ZZZZZZ
Would sort the current resultset by title and author (with missing authors treated as "ZZZZZZ").
zsort -in old -out new -tag section/subject-heading -attr title
would sort contents of resultset "old" by the contents of the subject-heading tags within section elements, and the title attribute contents, with the result put into a new resultset "new".
zsort -attr date -desc -attr title -asc -case
would sort (the current resultset) from newest to oldest dates, and by title within each date with case sensitive sorting.
Note that sorting is highly dependent on the server and not all servers will support all features (or even support sort at all). The cheshire server supports all of the types of sorts listed above, with the exception of frequency ordering. In the cheshire server if a record has multiple elements that match the criteria (e.g., many subject headings in many section in the example above, then only the first occurrence in the document is used for the sort).
zhighlight
highlight
cheshirehighlight
Highlight Search Terms in data. The basic Tcl command syntax is:
highlight <-stem> "search word string" "data string"
"pre-string" "post-string"
The "data string" is searched for each occurrence of words (or stems) in the "search word string" (the search string is assumed to follow the syntax of a cheshire search command, but may also just be a string of words, in which case the first word is ignored). For each word (or stem) found, the "pre-string" is inserted before it, and the "post-string" is inserted after it (or after the stem if the "-stem" options is specified. Matching for either words or stems is not case-sensitive. Aliases for "highlight" are "zhighlight" and "cheshirehighlight". For example, suppose a search has been done for "zfind su mathematics" in a cheshire database, if the formatted record (rec) looks like:
Title: Essays in statistical science : papers in honor of
P.A.P. Moran / edited by J. Gani and E.J. Hannan.
Publisher: Sheffield, Eng. : Applied Probability Trust.
Date: 1982.
Pages: 434 p. : ill..
Notes: "Journal of applied probability special volume ;
v.19A"
Includes bibliographies and index
Publications of P.A.P. Moran: p. 1-6
Subjects: Mathematical statistics.
Statistics.
Stochastic processes.
Other Authors: Moran, P. A. P. (Patrick Alfred Pierce), 1917.
Hannan, E. J. (Edward James), 1921.
Gani, J. M. (Joseph Mark).
Then the Tcl statement:
set result [highlight -stem "zfind su mathematics" $rec "<START>" "<END>"]
would set result to:
Title:
Essays in statistical science : papers in honor of
P.A.P. Moran / edited by J. Gani and E.J. Hannan.
Publisher: Sheffield, Eng. : Applied Probability Trust.
Date: 1982.
Pages: 434 p. : ill..
Notes: "Journal of applied probability special volume ;
v.19A"
Includes bibliographies and index
Publications of P.A.P. Moran: p. 1-6
Subjects: <START>Mathemat<END>ical statistics.
Statistics.
Stochastic processes.
Other Authors: Moran, P. A. P. (Patrick Alfred Pierce), 1917.
Hannan, E. J. (Edward James), 1921.
Gani, J. M. (Joseph Mark).
Note that this is a Client-side operation and the highlighted items will NOT be restricted to the occurrences items actually indexed by the index used in searching, but it will highlight any occurrence in the string. If the "-stem" option is not used, then the highlighted occurrences must match the entire word as specified in the search string. Note however that this will work with ANY search string and ANY data string, so it can be used anywhere. For example...
% set query "zfind su THIS IS A TEST"
zfind su THIS IS A TEST
% set data "This is the data that we are testing on."
This is the data that we are testing on.
% set result [highlight $query $data "<B>" "</B>"]
<B>This</B> <B>is</B> the data that we are
testing on.
% set result [highlight -stem $query $data "<B>" "</B>"]
<B>Thi</B>s <B>is</B> the data that we are
<B>test</B>ing on.
Remember that syntactical parts of a query (like zfind, index names,
and
Boolean operators) are ignored in highlight processing, so that a query
like:
% set query "search title data and author test"
search title data and author test
would give the following when applied to the same data as above:
% set result [highlight $query $data "<B>" "</B>"]
This is the <B>data</B> that we are testing on.
% set result [highlight -stem $query $data "<B>" "</B>"]
This is the <B>data</B> that we are
<B>test</B>ing on.
pTranLog transaction_code
Outputs a transaction log record (see cheshire2/pTranLog.c for details). Intended for use with the Tcl/Tk X window client and the opac script.
Collects and outputs results from a questionaire (see cheshire2/sResults.c for details. Intended for use with the Tcl/Tk X window client and the questionnaire script.
cheshire_search indexname1 [RELOP] search_string1 [[BOOLOP] indexname2 [RELOP] search_string2 [BOOLOP2]... [resultsetid id_string]
search indexname1 [RELOP] search_string1 [[BOOLOP] indexname2 [RELOP] search_string2 [BOOLOP2]... [resultsetid id_string]
LFIND indexname1 [RELOP] search_string1 [[BOOLOP] indexname2 [RELOP] search_string2 [BOOLOP2]... [resultsetid id_string]
lfind indexname1 [RELOP] search_string1 [[BOOLOP] indexname2 [RELOP] search_string2 [BOOLOP2]... [resultsetid id_string]
This command functions the same way as ZFIND above, however it is used only in the webcheshire or staffcheshire client programs to search the local database instead of using z39.50 to search remote databases. Some Tcl variables must be set to enable use of this command. The variables are:
CHESHIRE_CONFIGFILE: This should be set to the full pathname of the configuration file for the database.
CHESHIRE_DATABASE: This should be set to the name of the database (file) to be searched.
CHESHIRE_NUMREQUESTED: This should be set to the maximum number of records to be returned by the search.
CHESHIRE_NUM_START: This should be set to the number of the record in the search resultset that should be the first record returned.
CHESHIRE_ELEMENTSET: This should be set to the elementset name (or OID) that should be used in constructing the results of this search.
CHESHIRE_RECSYNTAX: This should be set to the record syntax name (or OID) that should be used in constructing the results of this search. (e.g.; GRS1, SUTRS, XML, SGML).
CHESHIRE_ATTRIBUTESET: This should set to the attributeset to be used in parsing the query.
CHESHIRE_LOGFILE: This should set to the pathname of a file to used in logging any errors in query processing -- defaults to STDERR.
CHESHIRE_RETURN_PAGEDOCS: When searching a "PAGED_DIRECTORY_REF" type of index, this forces the search to return a constructed document record with the page references attached instead of page records. (See configuration file documentation for more information on PAGE_DIRECTORY_REF indexes).
Cheshire_Fetch resultset_name number_to_fetch
start_position
cheshire_fetch resultset_name number_to_fetch
start_position
FETCH resultset_name number_to_fetch start_position
fetch resultset_name number_to_fetch start_position
fetch_result resultset_name number_to_fetch start_position
fetch_results resultset_name number_to_fetch
start_position
The Cheshire_Fetch command is used to retrieve and format records from stored resultsets. It is only available in webcheshire, and is the analog of zdisplay for retrieving from local resultsets. Since resultsets are not stored by default in webcheshire, they must be explicitly created using the RESULTSETID option in search commands. The optional number_to_fetch and start_position (both required if used) should be numbers (again same as in zdisplay). If these two are not specified the currently set values for CHESHIRE_NUMREQUESTED and CHESHIRE_NUM_START will be used. If these are not set and the number_to_fetch and start_position values are not specified, then an error will result. All of the other global variables discussed above in Cheshire_Search have the same effects in Cheshire_Fetch.
TileBar_Search index_name {concept1 terms} {concept2 terms} [elib_id]
tilebar_search index_name {concept1 terms} {concept2 terms} [elib_id]
bfind index_name {concept1 terms} {concept2 terms} [elib_id]
tbsearch index_name {concept1 terms} {concept2 terms} [elib_id]
tb_search index_name {concept1 terms} {concept2 terms} [elib_id]
TileBar_Search implements a search for two sets of concepts/terms from a "PAGED_DIRECTORY_REF" type of index. These are then returned as a Tcl list in a format for presentation via the DLIB tilebars java client. This command requires the use of the same Tcl variables as the cheshire_search command above.
cheshire_close
cheshire_exit
close_cheshire
exit_cheshire
This command properly closes all open files named in the configuration files. To ensure that all updates to files and indexes are properly made this command must be executed before exiting from the client program. (needed for StaffCheshire and WebCheshire only)
LCCBuild LCC_data_file_name
LCCBUILD LCC_data_file_name
lccbuild LCC_data_file_name
This routine takes a file of information about the Library of Congress Classification Scheme and builds an internal table to provide a hierarchical "description" of any LCC class number. The data file should contain lines that provide a heading for each level in the LCC hierarchy. The lines should contain either a single class or a range of LCC class values, followed by a colon, followed by the description of the class or range of classes. Each line should be indented using tab characters, with each tab representing a level in the LCC hierarchy. The following shows an example drawn from the lc_outline.text data file included in the "doc" directory of the cheshire distribution.
A-ZZZ: Library of Congress Classification Topic:
\\t A-AZ: General works.
\\t\\t AC: Collections. Series. Collected Works.
\\t\\t AE: Encyclopaedias (General).
\\t\\t AG: Dictionaries and other general reference books.
\\t\\t AI: Indexes (General).
or
\\t E-FZ: History: America.
\\t\\t E:
\\t\\t\\t 11-29: (General)
\\t\\t\\t 31-46: North America.
\\t\\t\\t 51-99: Indians. Indians of North America.
\\t\\t\\t 101-135: Discovery of America and early explorations
\\t\\t\\t 151-9999: United States (General).
\\t\\t\\t\\t 184-185.98: Elements in the population.
\\t\\t\\t\\t\\t 184.5-185.98: Afro-Americans.
LCCGet "alphapart numericpart"
LCCGET "alphapart numericpart"
lccget "alphapart numericpart"
This routine uses an internal table of information about the Library of Congress Classification Scheme (loaded using the LCCBuild command) to provide a hierarchical "description" of any class number indicated by the alphabetic main class,alphapart, and numeric subclass, numericpart. Each element of the returned string is separated by an asterisk and represents a lower level of the hierarchy. These strings can be turned into Tcl lists using the Tcl "split" command on the asterisks. The first element of all the strings are the same, representing the root of the hierarchy. For example:
% lccget QA 76
*Library of Congress Classification Topic:*Science.*Mathematics*Computer Science. Electronic data processing.
% lccget z 699
*Library of Congress Classification Topic:*Bibliography*Libraries.*Library science. Information science.*The collections. The books.*Machine methods of information storage and retrieval. Mechanized bibliographic control.
LCCDestroy
LCCDESTROY
lccdestroy
This routine removes the internal table of information about the Library of Congress Classification Scheme loaded using the LCCBuild command.
XML_ELEMENT elementset specifications
XML or SGML elements may be extracted dynamically from the records in a database using the following format specification in the config file (assuming XML output is wanted):
<displaydef name="XML_ELEMENT_" OID="1.2.840.10003.5.109.10">
<convert function="XML_ELEMENT">
<clusmap>
<from>
<tagspec>
<ftag> SUBST_ELEMENT </ftag>
</tagspec>
</from>
<to>
<tagspec>
<ftag> SUBST_ELEMENT </ftag>
</tagspec>
</to>
</clusmap>
</convert>
</displaydef>
This format is used in querying by setting the elementsetname to "XML_ELEMENT_xxx" where the "xxx" is the XPATH name of element in the records which can be specified by a simplified XPATH string (only direct paths ( /a/b/c/ etc.) are supported and not the xpath keyword specifications for relative paths. For example, using the displaydef above, and if a single tag is wanted, then just the tag name is needed.
For example, assuming the above displaydef is defined for the example bibfile database (see index/testconfig.new), then sending the following commands to the client:
% zset recsyntax xml
% zset elementset "XML_ELEMENT_Fld245"
% zfind su mathematics
{OK {Status 1} {Hits 17} {Received 0} {Set Default} {RecordSyntax UNKNOWN}}
% zdisplay
Will result in...
{OK {Status 0} {Received 10} {Position 1} {Set Default} {NextPosition 11} {RecordSyntax XML 1.2.840.10003.5.109.10}} {<RESULT_DATA DOCID="1">
<Fld245 AddEnty="No" NFChars="0"><a>Singularitâes áa Cargáese</a></Fld245>
</RESULT_DATA>
} {<RESULT_DATA DOCID="2">
<Fld245 AddEnty="Yes" NFChars="0"><a>Modáeles locaux de champs et de formes /</a><c>Robert Roussarie</c></Fld245>
</RESULT_DATA>
} {<RESULT_DATA DOCID="5">
<Fld245 AddEnty="No" NFChars="0"><a>Metody modelirovaniëiìa i obrabotka informaëtìsii /</a><c>otv. redaktory K.A. Bagrinovskiæi, E.L. Berlëiìand</c></Fld245>
</RESULT_DATA>
Notice that the extra tag <RESULT_DATA> has been added to each record (the DOCID attribute is the internal document ID for the source record).
Thus, any XML/SGML element can be requested from the database records of a database with this display format defined.
For more complete paths, e.g.:
% zset elementset "XML_ELEMENT_/USMARC/VarFlds/Titles/Fld245"
note that the path need not be a complete path, as long as the subordinate path elements are descendents of the superordinate ones, the path can be matched. Also, if a set of elements is wanted from a record, these may be specified using the XPATH "|" notation, for example
% zset elementset "XML_ELEMENT_Fld245|Fld650|Fld651
would retrieve all Fld245, Fld650 and Fld651 tags from the record (so it isn't -really- single element extraction at all).
Note ALSO that because the XPATH notation is converted into TAGSPECs internally, all of the wildcard and pattern matching available in configfile TAGSPECs is available in the XPATH specifications (this is NOT, however, guaranteed to be a real XPATH wildcards implementation).
For example, in place of the above example
% zset elementset "XML_ELEMENT_Fld245|^Fld65."
could be used to match Fld245 along with any tag starting with "Fld65" followed by any other character.
There is also now support for attribute (and attribute+value) specifications using XPATH. For example:
% zset elementset XML_ELEMENT_Fld245/@AddEnty
would retrieve just the AddEnty attribute values for the Fld245 tag, and
% zset elementset XML_ELEMENT_Fld245/@AddEnty=No
would return just Fld245's that had the attribute AddEnty with the value "No".
However, the combination of full/partial paths with regular expressions will usually fail to work (due to the way the paths are turned into TAGSPECs), so the following will NOT work correctly...
% zset elementset "XML_ELEMENT_TITLES/Fld245|SUBJECTS/^Fld65."
This would be interpreted as the FTAG path...
<FTAG>TITLES</FTAG><s>Fld245|SUBJECTS</s><s>^Fld65.</s>
Which would not match any tags in a correctly constructed USMARC record.
Please also note that this is just a display format for records retrieved by searches and is not an additional search -- If no element specified in the XPATH specification is found in a retrieved record an empty record will be returned. These empty records look like:
<RESULT_DATA DOCID="74"></RESULT_DATA>
which implies that the record with DOCID 74 matched the query, but did not have any fields matching the XPATH specification.
The records created using this method now include the full XPATH for each item extracted. The new results (for a XML_ELEMENT_Fld650 elementset specification from a USMARC DTD database) look like:
...
<RESULT_DATA DOCID="2">
<ITEM XPATH="/USMARC[1]/VarFlds[1]/VarDFlds[1]/SubjAccs[1]/Fld650[1]">
<Fld650 SubjLvl="NoInfo" SubjSys="LCSH"><a>Vector algebra.</a></Fld650>
</ITEM>
<ITEM XPATH="/USMARC[1]/VarFlds[1]/VarDFlds[1]/SubjAccs[1]/Fld650[2]">
<Fld650 SubjLvl="NoInfo" SubjSys="LCSH"><a>Differential forms.</a></Fld650>
</ITEM>
<ITEM XPATH="/USMARC[1]/VarFlds[1]/VarDFlds[1]/SubjAccs[1]/Fld650[3]">
<Fld650 SubjLvl="NoInfo" SubjSys="LCSH"><a>Singularities (Mathematics)</a></Fld650>
</ITEM>
<ITEM XPATH="/USMARC[1]/VarFlds[1]/VarDFlds[1]/SubjAccs[1]/Fld650[4]">
<Fld650 SubjLvl="NoInfo" SubjSys="LCSH"><a>Differential equations.</a></Fld650>
</ITEM>
</RESULT_DATA>
...
There is now an ITEM element for each matching document element (which is included as a subelement of the ITEM element). The XPATH attribute of the element is XPATH for the element, including the sequence number of sibling elements when they have the same parent path.
In addition, the XPATH specifications for the element wanted can ALSO include occurrence numbers, which are used to restrict the fields returned for example for XML_ELEMENT_Fld650[3] the same record as above would only return the third occurrence of the subject:
<RESULT_DATA DOCID="2">
<ITEM XPATH="/USMARC[1]/VarFlds[1]/VarDFlds[1]/SubjAccs[1]/Fld650[3]">
<Fld650 SubjLvl="NoInfo" SubjSys="LCSH"><a>Singularities (Mathematics)</a></Fld650>
</ITEM>
</RESULT_DATA>
This capability gives fairly powerful control over the display elements extracted.
STRING_SEGMENT_ elementset specifications
In addition the above, there is another "display format" that does not require a specification in the config files. It is "STRING_SEGMENT_..." treated similarly to the XML_ELEMENT_ elementset specifications as an elementsetname. The purpose is to exact strings from the underlying SGML/XML data of a Cheshire database without having to do parsing of the records. This will work only when the record syntax requested is XML, SGML or SUTRS. The basic forms are:
zset elementsetname STRING_SEGMENT_400
(for Z connections) or
set CHESHIRE_ELEMENTSET STRING_SEGMENT_400
(for webcheshire local retrieval)
zset elementsetname STRING_SEGMENT_200_400
(for Z connections) or
set CHESHIRE_ELEMENTSET STRING_SEGMENT_200_400
(for webcheshire local retrieval)
where 400 is the END position of the part of the record that you want to get and 200 is the START position, the second form assumes that start position is the beginning of the record (char position 0). Unlike the XML_ELEMENT_... definitions above, this does NOT require an DISPLAYDEF -- (i.e. it should work for any records).
Alternatively, the following form can be used to extract the FIRST matching SGML/XML tag in a record...
zset elementsetname STRING_SEGMENT_Fld245
(for Z connections) or
set CHESHIRE_ELEMENTSET STRING_SEGMENT_Fld245
(for webcheshire local retrieval)
To extract the first occurrence of the tag Fld245 -- note that ONLY a single tag and NOT an xpath can be used with this, since it is not parsing the record, but doing simple string matching for the first occurrence of the tag. If a record doesn't have the tag anyplace, it returns the string "*** NO MATCHING TAGS IN MATCHING RECORD ***" in place of the tag. Note also that it is entirely possible to return string values that will not be valid XML or SGML, it is up to the user/scripter to take appropriate action when using this type of retrieval. The primary advantages are 1) No parsing is done so it is fast to return results and 2) arbitrary pieces of the records can be extracted. Of course, even though you can send such an elementset name to any Z server, only cheshire servers will be able to process it.
set CHESHIRE_SEARCH_STAT_DUMP 1
set CHESHIRE_SEARCH_STAT_DUMP 0
Setting this variable in webcheshire or staffcheshire causes statistics about ranking to be collected and output for each ranked search query. The output is appended to the results returned as a set of lines, one for each matching document in the collection NOTE that the stats output includes entries for ALL matching records, not just the number requested by the search (CHESHIRE_NUMREQUESTED). The variables returned in each line (each matching document) are:
| Variables |
Description |
| docid | Cheshire internal document ID number |
| compid | Cheshire internal component ID number |
| doclen | Document length (in bytes) |
| qlen | Query length |
| nmterms | number of matching terms between document and query |
| ndocs | number of documents |
| distndoc | number of documents (for distributed apps) |
| min_cf | mininum collection frequency (over all terms in query) |
| max_cf | maximum collection frequency (over all terms in query) |
| min_tf | mininum document term frequency (for this query) |
| max_tf | maximum document term frequency |
| sum_entr | total doc/comp matches in index |
| min_entr | mininum doc/comp matches in index |
| max_entr | maximun doc/comp matches in index |
| X1 | PROB: X1 - Okapi: Sum of RSJ values for terms - CORI: Sum of I |
| X2 | PROB: X2 - Okapi: Sum of document term frequency - CORI: Sum of T |
| X3 | PROB: X3 - Okapi & CORI: average document length |
| X4 | PROB: X4 - Okapi: Constant k1 - CORI: 0 |
| X5 | PROB: X3 - Okapi: Constant k3 - CORI: 0 |
| X6 | PROB: X3 - Okapi: Constant b - CORI: 0 |
| logodds | PROB: logodds value - Okapi & CORI: 0.0 |
| docwt | RSV/Probability for this document |
| $compname | Component name (if component) |
None known -- but there may be undesireable features :-)
Ray R. Larson ( )