Cheshire II Commands

COMMANDS

Zselect, Zfind, Zscan, Zdisplay, Zshow, Zset, Zclose, ZQL, Zformat, Zmakeformat, Zremoveformat, Zshowformat, Zdelete, ZSort, zhighlight, pTmpNam, pTranLog, sResults, Cheshire_Search, Cheshire_Fetch, TileBar_Search, Cheshire_Close, LCCBuild, LCCGet, LCCDestroy CHESHIRE_SEARCH_STAT_DUMP

(See also the map widget commands for cheshire2 and staffcheshire)

\- Special Tcl/Tk commands for manipulating files and searching in CheshireII

DESCRIPTION

This document describes the CheshireII specific command language features added to Tcl/Tk for the Cheshire II client programs cheshire2, ztcl, webcheshire, and staffcheshire.

Cheshire2 is the primary client program of a the CheshireII information retrieval system. The system incorporates a client-server architecture, X window interface and WWW support, Boolean and probabilistic retrieval methods, and a flexible scripting facility using the Tcl/Tk language. The ztcl client program is identical to the CheshireII program, except that it doesn't include the TK toolkit for X windows. The webcheshire client program includes all of the features of ztcl, with the addition of special commands access to server-side search and retrieval from Cheshire databases without need for establishing a Z39.50 connection. Webcheshire is intended for use in creating cgi-bin scripts for WWW applications using the Cheshire system. The staffcheshire client (under development) includes all of the features of the webcheshire client with the addition of commands to examine and modify system configuration files and data in the database, and the inclusion of the TK toolkit for building X window based interfaces for maintenance of the Cheshire system..

The Cheshire client programs are configured and run via Tcl/Tk scripts and the server is driven by SGML-like configuration files that describe the database files and indexes for the system. The rest of this document describes the specific Tcl/Tk commands that facilitate the use of the CheshireII database or other Z39.50 compatible databases from within Tcl/Tk. They provide an interface to the 'c' routines that make up the bulk of the system from the Tcl scripting language.


COMMAND Descriptions


Zselect servername [hostaddress databasename portnumber] [idauthentication]

ZSELECT servername [hostaddress databasename portnumber] [idauthentication]

zselect servername [hostaddress databasename portnumber] [idauthentication]

The zselect command is used to establish a connection to a particular Z39.50 server and specify the database you would like to search (when there is a choice available). There are a number of server/database combinations included in the Cheshire clients, and for these only the servername is needed. (All of these hosts are accessible to the client programs in the global Tcl array called "Z_HOSTS" and can be displayed using the "parray Z_HOSTS" command). To connect to any other server the first four parameters must be provided (after the initial connection, subsequent connections during the same client session may use the servername alone).

servername (OPTIONAL)

This specifies a name to use for the server/database combination. When used as the only parameter the servername indicates which server to make a connection to. (NOTE: The particular database to be searched on a given server can be set using the zset database command, without having to reconnect to the server under another servername)

hostaddress (OPTIONAL)

This specifies the internet name or address of the Z39.50 server.

databasename (OPTIONAL)

The name of the database to search. This must be a valid database name for the server otherwise the search commands will fail. A common database name used by Z39.50 servers is "cat" to mean the online catalog.

portnumber (OPTIONAL)

This specifies the port on which the Z39.50 client must connect. The "well-known port" for Z39.50 is 210, but different servers may choose to use different ports.

idauthentication (OPTIONAL)

This specifies the authentication string that the particular server requires for connection. This string is passed to the server in the "idAuthentication" field of the init PDU when attempting to connect.


Zfind indexname1[ATTR] [RELOP] search_string1 [[BOOLOP | PROXOP | FUZZYOP | RESTRICTOP | MERGEOP] indexname2 [ATTR] [RELOP] search_string2 [BOOLOP2 | PROXOP2 | FUZZYOP2 | RESTRICTOP2 | MERGEOP2]... [resultsetid id_string]

ZFIND indexname1[ATTR] [RELOP] search_string1 [[BOOLOP | PROXOP | FUZZYOP | RESTRICTOP | MERGEOP] indexname2 [ATTR] [RELOP] search_string2 [BOOLOP2 | PROXOP2 | FUZZYOP2 | RESTRICTOP2 | MERGEOP2]... [resultsetid id_string]

zfind indexname1[ATTR] [RELOP] search_string1 [[BOOLOP | PROXOP | FUZZYOP | RESTRICTOP | MERGEOP] indexname2 [ATTR] [RELOP] search_string2 [BOOLOP2 | PROXOP2 | FUZZYOP2 | RESTRICTOP2 | MERGEOP2]... [resultsetid id_string]

zfind id_string:[itemid,itemid,...itemid-itemid[find,regular_expression]

This command will search the current database specified on the current host/server established by the zselect command.

resultsetid id_string (OPTIONAL)

This specifies a server-side set name (id_string) which will be used for the storing the results of the current search. This will not work unless the server supports named result sets.

indexname

This can be any name in the BIB1, GILS, GEO or EXP1 attribute sets. In addition there are many aliases assigned for various names. The list of supported index names is stored in the global Tcl array "ALL_INDEXES" and can be displayed using the "parray ALL_INDEXES" command (note that the attribute set USE value for each of these is the third number shown in the list of numbers included in the display, the following numbers are the other attributes: Relation, Position, Structure, Truncation, Completeness, in order. The first and second numbers are an internal id for the attribute, and a code indicating which attribute sets this item is used for). Here are some entries from ALL_INDEXES:


ALL_INDEXES(ABSTRACT) = ABSTRACT Abstract {116 17 62 0 3 6 0 1}
ALL_INDEXES(ANY) = ANY Any {148 17 1016 0 3 6 0 1}
ALL_INDEXES(ANYWHERE) = ANYWHERE Anywhere {176 17 1035 0 3 6 0 1}
ALL_INDEXES(AUTHOR) = AUTHOR Author_Personal_name {0 17 1 0 3 6 0 1}
ALL_INDEXES(AUTHOR-NAME_CONFERENCE_KEY) = AUTHOR-NAME_CONFERENCE_KEY Author-name_conference {130 17 1006 0 3 1 0 1}
ALL_INDEXES(AUTHOR-NAME_CORPORATION_KEY) = AUTHOR-NAME_CORPORATION_KEY Author-name_corporation {127 17 1005 0 3 1 0 1}
ALL_INDEXES(AUTHOR-NAME_PERSONAL_KEY) = AUTHOR-NAME_PERSONAL_KEY Author-name_personal {126 17 1004 0 3 1 0 1}

[ATTR]

A set of attributes and values to be included with the query

This is a list of Z39.50 attribute number and their values, contained in square brackets and (optionally) separated by commas, These are the numerical values for the attributes from the Z39.50 standard (or attribute set definition) and these will override existing attributes associated with index names except for the USE attribute -- although use attributes may be specified this way, both the use attribute from the specified index name and the one specified in square brackets will be sent to the server. The attributes are expressed in the form "n=m" where "n" is the attribute type number and "m" is the value. For example "[1=55 2=1 3=1 4=3]" would indicate a USE(1) attribute of 55(CODE-Geographic area in BIB-1), a RELATION(1) attribute of 1(less than), a POSITION(3) attribute of 1(first in field), and a STRUCTURE(4) attribute of 3(key). NOTE: In Tcl square brackets indicate a command to be executed, so to use them in the Tcl clients you will need use a backslash before the open and close square bracket.

To combine multiple attribute set in the same query, you may specify the attributeset OID or Symolic names for the following Attribute sets: "BIB-1", "EXP-1", "EXT-1", "CCL-1", "GILS", "STAS", "COLLECTIONS-1", "CIMI-1", "GEO-1", "ZBIG", "UTIL" ;, "XD-1", "ZTHES", "FIN-1", "DAN-1" and "HOLDINGS"). Either these symbolic names (and many variants) or the OIDs may be specified (OIDs of unlisted attribute sets can also be specified). These are specified before the attribute type and followed by a space. The attributeset will only apply to the immediately following attribute specification (by default the current ATTRIBUTESETNAME set by the ZSET command is used to interpret the attributes). For example...

zfind title [GILS 1=1174, 1.2.840.10003.3.9 2=3] stuff

would search for 'stuff' using GILS USE attribute 1173 (Supplemental_Information) with GEO (1.2.840.10003.3.9) relational attribute 3.

Note also that this form of specifying attributes can be used in place of an explicit index name, for example...

zfind [GEO 1=55, BIB1 2=3] stuff

or
zfind [1=5] stuff

will send these attributes to the server in without any additional USE attributes.

RELOP

The relational operation to be performed. These are:


Value

Name

Semantics

blank
=
Equals Searches for equal values to the search string.
<=
LE
.LE.
Less than or Equal Search for values less than or equal to the search string.
<
LT
.LT.
Less Than Search for values less than the search string.
>
GT
.GT.
Greater Than Search for values greater than the search string.
>=
GE
.GE.
Greater Than or Equal Search for values greater than or equal to the search string.
<>
!=
NE
.NE.
Not Equal Search for values NOT equal to the search string.
??
PHON
.PHON.
Phonetic Search for values phonetically similar to the search string.
%
STEM
.STEM.
Stem Search for stemmed values equal to the stemmed search string.
@
REL
.REL.
Relevant Search for items relevant to the search string (Uses the Berkeley TREC-3 Algorithm
<=>
WITHIN
.WITHIN.
Within Search for items within a range (e.g.: used for dates indicated by year-year) (BIB-1 extension)

GEO Profile RELOPs


The GEO profile includes additional operators for geographic and time period searching. These can be specified (assuming the attributeset is GEO) as follows. (BIB-1 relops, <=, <, =, >, >=, and <> are the same as BIB-1)

Value

Name

Semantics

>=<
.OVERLAPS.
[GEO 2=7]
Overlaps The access point region has a geometric area in common with the search term region. Given a search term region of S and access point region of T, the following algebra expresses the conditions required: {S(North) >= T(South)} and {S(South) <= T(North)} and {S(East) >= T(West)} and {S(West) <= T(East)}.
@>=<
.OVERLAPS_RANK.
[GEO 2=707]
Overlaps (ranked) The access point region has a geometric area in common with the search term region. Given a search term region of S and access point region of T, the following algebra expresses the conditions required: {S(North) >= T(South)} and {S(South) <= T(North)} and {S(East) >= T(West)} and {S(West) <= T(East)}. Rank the resulting matches by amount of overlap and relative size of target and search areas.
>#<
.FULLY_ENCLOSED_WITHIN.
[GEO 2=8]
Fully Enclosed Within The access point region is fully enclosed within the search term region.
@>#<
.FULLY_ENCLOSED_WITHIN_RANK.
[GEO 2=708]
Fully Enclosed Within (ranked) The access point region is fully enclosed within the search term region. Rank the resulting matches by amount of overlap and relative size of target and search areas.
<#>
.ENCLOSES.
[GEO 2=9]
Encloses The access point region fully encloses the search term region.
@<#>
.ENCLOSES_RANK.
[GEO 2=709]
Encloses (ranked) The access point region fully encloses the search term region. Rank the resulting matches by amount of overlap and relative size of target and search areas.
<>#
.OUTSIDE_OF.
[GEO 2=10]
Fully Outside Of The access point region has no geometric area in common with the search term region.
+-+
.NEAR.
[GEO 2=11]
Near The access point region falls within a default distance of the search term region. The default distance is defined by the server.
.#.
.MEMBERS_CONTAIN.
[GEO 2=12]
Members Contain The access point element or one of its subordinate elements is equal to the search term value (subject to possible qualification by the Truncation and Structure Attributes). (Not currently available on Cheshire servers)
!.#.
.MEMBERS_NOT_CONTAIN.
[GEO 2=13]
Members Not Contain The access point element and all of its subordinate elements are not equal to the search term value (subject to possible qualification by the Truncation and Structure Attributes). (Not currently available on Cheshire servers)
:<:
.BEFORE.
[GEO 2=14]
Before The access point date (or date range) is before the search term date (or date range).
:<=:
.BEFORE_OR_DURING.
[GEO 2=15]
Before or During The access point date (or date range) is before or during the search term date (or date range).
:=:
.DURING.
[GEO 2=16]
During The access point date (or date range) is during the search term date (or date range). (Same as WITHIN above on Cheshire)
:>=:
.DURING_OR_AFTER.
[GEO 2=17]
During or After The access point date (or date range) is during or after the search term date (or date range).

:>:
.AFTER.
[GEO 2=18]

After The access point date (or date range) is after the search term date (or date range).

Note that most systems do not support ??, %, @, or 'within' and many support only equal. The Cheshire server (or webcheshire) supports @ to indicate a probabilistic ranked search should be performed. The Cheshire server (or webcheshire) also supports % to indicate that a non-probabilistic ranking of the results should be made. Basically, on Cheshire servers, % ranks based on the number of terms in common between the query and the document with a minumum (set to half of the terms -- or 1 if there is only a single query term). It was originally set up for use with image retrieval. In a lot of simple cases it will do things in a similar way to probabilistic ranking, in more complex cases it is more similar to Boolean, but allows limited partial matching. Four additional ranking operators for Cheshire are shown in the table below.

Value

Name

Semantics

@@
.TREC2.
[2=510]
Berkeley TREC-2 (Cheshire Only) Use the Berkeley TREC-2 Algorithm for results ranking
@*
.TREC2FBK.
[2=510]
Berkeley TREC-2 with Blind Feedback (Cheshire Only) Use the Berkeley TREC-2 Algorithm for results ranking with blind relevance feedback (NOTE: works only with VECTOR indexes -- see index_vectors)
@+
.OKAPI.
[2=500]
Okapi BM-25 (Cheshire Only) Use the Okapi BM-25 Algorithm for results ranking
@/
.TFIDF.
[2=530]
TFIDF (Cheshire Only) Use the Vector Space TFIDF Algorithm for results ranking (NOTE: works only with VECTOR indexes)
@&
.LUCENE.
[2=540]
TFIDF_LUCENE (Cheshire Only) Use the Lucene version of the Vector Space TFIDF Algorithm for results ranking (NOTE: works only with VECTOR indexes)
@#
.CORI.
[2=501]
CORI (Cheshire Only) Use the CORI Algorithm for results ranking -- primarily intended for distributed search

BOOLOP

The Boolean operator to apply between results from different indexes. these are:

AND | .AND. | && : Boolean AND

OR | .OR. | || : Boolean OR

NOT | .NOT. | ANDNOT | .ANDNOT. | !! : Boolean NOT

Note that parentheses may be used to group Boolean sub-expressions, for example:

zfind title gone and (title wind or title fishing)

search_string

The term(s) to locate in the index. May include a truncation symbol (#) or a DO NOT TRUNCATE symbol ( ! ). NOTE: that indexes defined a exactkeys will default to implicit right-hand truncation for matching - to overide this you will need to specify "[5=100]" following the index name, this is the Z39.50 DO NOT TRUNCATE attribute setting. Or, more simply, use the " ! " at the end of the search string (be sure to leave a space between the last word and the ! If the index being searched supports proximity (defined in the database configuration file, then phrases to be searched within same index can be indicated by surround the phrase with dollar signs, e.g:

zfind title \$gone with the wind\$

PROXOP

The proximity operators to apply between results from within the same indexes. these are:

!PROX | !ADJ | !NEAR | !FAR : Proximity operators -- not ordered

!OPROX | !OADJ | !BEFORE | !ONEAR | !OFAR : Proximity operators -- ordered

The "O" versions of the operators require the search items to appear in the order specified, while the non-O version do not. In the !PROX, !ADJ, and !NEAR versions the search items must be within the specified or default distance. In the !FAR version the search items must be farther apart than the specified or default distance. The default distance for !PROX, !OPROX, !ADJ, !OADJ and !BEFORE is 2, for !NEAR !ONEAR, !FAR and !OFAR is 20.

Each proximity operator may be modified by the type of element to be used for determining proximity. They must be appended to the operator (!OPER/ELEMENT).These are:

/C | /CHAR : Characters.

/W | /WORD : Words (the default).

/S | /SENT | /SENTENCE : Sentences.

/P | /PARA | /PARAGRAPH : Paragraphs.

/SECTION : Sections.

/CHAPTER : Chapters.

/DOCUMENT : Documents.

/ELEMENT : Elements.

/SUBELEMENT : Subelements.

/ELEMENTTYPE : Elementtypes

/BYTE : Bytes.

Each proximity operator can also be modified by a distance to override the default distances indicated above. The takes the form of a slash followed by a integer number appended to the operator.

Examples of queries using these operators are:

ZFIND TI cat !PROX/3 TI hat
Find the title word "cat" within three words of the word "hat".

ZFIND ANYWHERE information !ADJ/WORD/3 ANYWHERE retrieval
Find the word information within three words of the word retrieval (assuming the anywhere index includes all of the document)

zfind anywhere information !NEAR/SUBELEMENT anywhere retrieval
Find the word retrieval within 20 subelements of the place where the word information is found.

As a shorthand for exact phrase matching using a proximity index on a Cheshire server, the phrase can be enclosed in dollar signs. For example:

zfind TI {$Gone with the Wind$}

Assuming TI is proximity index on the Cheshire server, then this query would find the query phrase with the words in the correct order (stopwords would be ignored in the matching). Note that this works for cheshire servers ONLY and will NOT work on other servers (they will just receive the query with dollar signs in it).

Note that not all servers (including Cheshire) support all of the elements and many (or most) do not include proximity searching at all. If a server doesn't support proximity searching an error message with be returned as the result of the search.

ZUZZYOP

A Fuzzy Boolean operator to apply between results from different indexes. these are:

!FUZZY_AND : Fuzzy AND

!FUZZY_OR : Fuzzy OR

!FUZZY_NOT: Fuzzy AND NOT

Fuzzy operators are versions of the Boolean operators that are less "strict" than the conventional Boolean operators, applied to weighted result lists. In place of Boolean AND, the "!FUZZY_AND" operator takes the smallest of the two weights in the result sets for the same record. The "!FUZZY_OR" takes the largest of the two weights for the same record. "!FUZZY_NOT" currently behaves the same way as strict Boolean "NOT". Otherwise these operators are used the same way as the strict Boolean operators.

RESTRICTOP

A restriction operation to apply between results these are:

!RESTRICT_FROM : See below

!RESTRICT_TO : See below

The "!RESTRICT_TO" and "!RESTRICT_FROM" operators take either a component result and a document result, or two component results (where one component contains the other). In the case of component and document results the component list is restricted to components that are in the document result -- the matching components only are returned retaining their weight from the original component result. When two nested component results are used with these operators the result is larger components that include one or more of the smaller components. (Note that with component and document results !RESTRICT_TO and !RESTRICT_FROM may be used interchangibly and the type of operation to be performed is determined by the nature of the resultsets, but with two component results Parent and Child, the following order should be followed...
Parent !RESTRICT_FROM Child
Child !RESTRICT_TO Parent

Naturally Parent and Child can be any sub-query that result in the appropriate kind of component.

MERGEOP

Ranking score merger operations to apply between results from different indexes. These are based on "data fusion" methods, and operate on pairs of intermediate results returned from searches (or results of other mergers). The merge operators are:

!MERGE_SUM : SUM of Scores

!MERGE_MEAN : Mean Scores

!MERGE_NORM : Normalized Mean Scores

!MERGE_NSUM:  SUM of Normalized Scores

!MERGE_NPRV: Normalize and Sum Scores with enhanced AND matches

!MERGE_CMBZ:  Augmented Normalized Scores for high-ranked documents and AND matches

!MERGE_PIVOT: Pivoted Normalization

The !MERGE_SUM operator combines the two resultsets (like a Boolean OR) but adds the weights (actually the resulting raw ranking adds 1 + a probabilistic result and 1.5 for boolean results with matching document or component ids in both lists, and the original value for items found only in a single result.)

The !MERGE_MEAN operator combines the two resultsets (like a Boolean OR) but takes the mean of the weights from items in both lists and half of the weight of items in only a single list.

The !MERGE_NORM operator combines the two resultsets (like a Boolean OR) but takes the MEAN (or average) of the MIN_MAX normalized weights from items in both lists and half of the MIN_MAX normalized weight of items in only a single list. MIN_MAX normalization scales all of the weights in a resultset based on the maximum and minimum weights in the resultset. The resulting weights lie in the range from 0 to 1. This is particularly useful when one partial resultset uses a different ranking algorithm from the other (such as merging normal probabilistic and Okapi BM-25 results). This is the (currently) recommended operator for merging probabilistic resultsets.

The !MERGE_NSUM operator normalizes the scores and takes the SUM of the normalized scores.

The !MERGE_CMBZ operator, like the previous one, normalizes the scores and takes the sum of normalized scores,
and doubles the total for items in both of  the input resultsets

The !MERGE_NPRV operator normalizes the scores and takes the doubled SUM of scores for items in both resultsets it then further differentiates the results from only a single list, retaining original scores for items that occur
in the top 100 of a ranked list (or the top half if less than one), and halves all remaining scores.

The !MERGE_PIVOT operator returns the adjusted scores for items in the left-hand resultset based on the scores for corresponding items (based on internal document id) in the right-hand resultset. The original purpose was to adjust weights for document components based on weights for the entire document. It turns out that this method is also beneficial in other situations (such as weighting one index result versus another index result). The basic formula
used is:
             final_score = (pivot_val * right_hand_score) + ((1-pivot_val) * left_hand_score);

The default value for the pivot_val is 0.70. This can be changed by constructing the operator like...
             index1 XXX  !MERGE_PIVOT/90 index2 YYY
would set the pivot_val to 0.90 (the number following the slash should be less than 100, since it is divided by
100 to set the pivot coefficient.

RELEVANCE FEEDBACK

When searching cheshire servers you can perform simple relevance feedback based on the first index entry in the cheshire
configuration file that contain the index mapping for "relevance" (i.e. RELAT tag with value 102). If the database you are
searching is set up this way then you can use a special form of the resultsetid to indicate that relevance feedback is to
be performed using that index. (Yes... the setup is a bit convoluted, but it also permits relevance feedback commands to be
transmitted over Z39.50).

For the client side things are fairly simple,  just use the resultsetid with the item numbers of the seen documents (this is order based, so the first item is a resultset is 1, the second 2, etc.). Assume that a search like "zfind topic xxx resultsetid newresult"
has been performed giving a resultset with 20 items. To indicate that you would like to do relevance feedback on items number
1, 2 and 15 in that result set you simply submit the query:

zfind newresultset:1,2,15

and those items will be used as the basis of the relevance feedback search. Ranges can be indicated using a hyphen, for example
to indicate items 3 through 5 and item 10 you would use:

zfind newresult:3-5,10

Any combination of individual items and ranges separated by commas may be used. The relevance feedback method used is a fairly simple one that takes the terms occurring in particular index elements (with the RELAT 102 tag) in the associated configuration file for each of the documents selected and

PATTERN MATCH RESTRICTIONS

Cheshire servers now also support a method for restricting resultsets based on regular expression matching on the entire record. In order to use this featurea resultset from a conventional search is needed, and the Z39.50 syntax is based on the syntax for relevance feedback. (The following discussion addresses the Z39.50 version first, and then the similar feature for the webcheshire client.)

If you have done a search and the resultset is named "default", you can request a search for the word "stuff" in records 2 and 5 of that resultset by doing a search like:

zfind default:2,5,find,stuff

Note that in this simple form, each word to be searched must be separated by commas, and the word "find" must appear first after the list of numbers representing the records (in resultset sequence) to be searched. Matching is NOT case sensitive. Ranges of resultset numbers can also be used. For example...

zfind default:2,5,10-30,find,stuff

would do the search for "stuff" in records 2, 5 and 10 through 30. For more complex searches full regular expression matching is available, but the entire resultset string must be enclosed in double quotes, single quotes, or braces ({}), for example to search for cat or dog anywhere in record 5 you could use:

zfind {default:5,find,(cat)|(dog)}

to find the exact string "cat and dog" you could use:

zfind {default:5,find,(cat and dog)}
or simply:
zfind {default:5,find,cat and dog}

To match any of a set of regular expressions, just separate each by commas, for example:

zfind default:2,5,find,stuff,blotz,zap

would search for words "stuff", "blotz", or "zap" in resultset records 2 and 5. This could also be expressed as:

zfind {default:2,5,find,(stuff)|(blotz)|(zap)}

Note that these searches are not exactly the same -- the first version searches for the three words surrounded by non-alphabetic strings, including blank, newlines, punctuation, etc. and the regular expression searches for the strings regardless of any surrounding characters. Simple words like "stuff" in the first example are treated internally as the regular expression:

(^|[^a-zA-Z])(stuff)([^a-zA-Z]|$)

Also note that numbers are considered word separators by this as well.

Remember that using "eval" in client TCL processing may strip a layer of quotes or braces from search strings before they actually reach the search parser, so if you are getting syntax errors, you might need to double the braces or quotes.  All of the above searches return a reduced resultset with only those records that are both included in the list of records, and that match the regular expression(s). Note that this form of resultset name can be used anyplace in a query that

a simple resultset name is used, so

zfind title gone with the wind AND {res1:1,(frankly my dear)}

would do a boolean "AND" between the resultsets returned by the title search and the resultset returned with regular expression matching on record 1 of results "res1". Note that "res1" must exist before the query is submitted or else an error will occur. Also note the because of the parsing method used, the regular expression may not include embedded commas, but only commas to separate the list items.

The webcheshire client has a similar feature, but only a single regular expression pattern is matched for ALL of the items resulting
from a normal search. The regular expression is simply set in a variable called "CHESHIRE_REGEX_FILTER" and that regular expression is applied to all of the results of subsequent searches until the variable is "unset" or set to a different value or the null string (""). The following example shows the usual sequence...

set CHESHIRE_DATABASE bibfile
set CHESHIRE_CONFIGFILE "testconfig.new"
set CHESHIRE_RECSYNTAX SGML
set CHESHIRE_NUM_START 1
set CHESHIRE_NUMREQUESTED 5

set query "search topic geometry"
set CHESHIRE_REGEX_FILTER {(mathematical statistics)}

set results [eval $query]

The returned "results" are limited to those that match both the main query and the regular expression in CHESHIRE_REGEX_FILTER. Note that the single term matching mode in the Z39.50 version is NOT used, so if you want to match a term like "stuff" surrounded by blanks, etc. you should use a regular expression that does that, such as:

set CHESHIRE_REGEX_FILTER {(^|[^a-zA-Z])(stuff)([^a-zA-Z]|$)}

Note also that because these filtering operations take place on the raw SGML/XML records, it is possible to include structural elements of the records in the regular expressions for either the webcheshire or Z39.50 forms. For example, the following can match "geometry" in the "Fld650" tags in the above tcl script...

set CHESHIRE_REGEX_FILTER {(<Fld650)(.*)(geometry)(.*)(</Fld650>)}

(Note also that if this the sort of search you want to do on a regular basis, it would be better and faster to just create an index for that tag).


Zscan indexname[ATTR] search_string stepsize numreq position

ZSCAN indexname[ATTR] search_string stepsize numreq position

zscan indexname[ATTR] search_string stepsize numreq position

This command will retrieve a set of index terms for the specified index name from the current database and the current host/server, as established by the zselect command. For the webcheshire client, the command name "lscan" or "cheshire_scan" is used with the same arguments.

indexname

As with ZFIND this can be any name in the BIB1, GILS, GEO or EXP1 attribute sets. In addition there are many aliases assigned for various names. The list of supported index names is stored in the global Tcl array "ALL_INDEXES" and can be displayed using the "parray ALL_INDEXES" command (note that the attribute set USE value for each of these is the third number shown in the list of numbers included in the display, the following numbers are the other attributes: Relation, Position, Structure, Truncation, Completeness, in order. The first and second numbers are an internal id for the attribute, and a code indicating which attribute sets this item is used for).

search_string

This term specifies the place in the index to scan. It should be enclosed in quotes or curly braces if there is more than a single word in the string

stepsize

This is the number of terms to skip between each of the terms returned. This allows refining of scans from a wide scan down to a more finely grained one. To return each term in the index, use 0.

Numreq

This is the number of term to be returned from the index.

position

This is position in the list of returned terms where the search_string will be located (if it exists in the index).


Zdisplay [resultsetid] [number_of_records] [start_records_num]
ZDISPLAY [resultsetid] [number_of_records] [start_records_num]
zdisplay [resultsetid] [number_of_records] [start_records_num]

Display number_of_records records resulting from a search having the specified resultsetid, or the last resultsetid used if not specified. If a start_record_num is supplied then display the records from the indicated ordinal position in the result, otherwise just continue from last record displayed. The number_of_records value defaults to the NumberOfRecordsRequested zset value. When number_of_records is supplied it resets the NumberOfRecordsRequested value to that number for the current connection.

resultsetid (OPTIONAL)

This specifies the server result set name to use when displaying a record. The set name will be used in subsequent display requests until changed or until the set is deleted on the server

start_records_num (OPTIONAL)

The ordinal number of the position in the result.

number_of_records (OPTIONAL)

The number of records that the client will try to retrieve from the result set.


Zshow parameter_name | ALL | SEARCH | PRESENT | SERVER | CLIENT
ZSHOW parameter_name | ALL | SEARCH | PRESENT | SERVER | CLIENT
zshow parameter_name | ALL | SEARCH | PRESENT | SERVER | CLIENT

This command returns a string containing information about the current session and connection.

ALL (OPTIONAL)

Show all of the values for all parameters.

SEARCH (OPTIONAL)

Show all of the values for all search-related parameters currently set.

PRESENT (OPTIONAL)

Show all of the values for all present(display)-related parameters.

SERVER (OPTIONAL)

Show all of the values for all server parameters (when connected)

CLIENT (OPTIONAL)

Show all of the values for all client-related parameters.

parameter name (OPTIONAL)

Show the value(s) of a particular parameter item. The following items can be shown:

sDBNames | database : Show current database name.

hits | resultcount | numhits : Show hits from last search.

records | numrecords | numrecords : Show number of records retrieved in latest search or present.

nextrec | nextResultSetPosition | nextrecordpos : Show ordinal position of next record to be retrieved by a present.

totalrecs | totalrecords | totalNumberRecordsReturned : Total number of record retrieved from the current search.

Hosts | Servers : Show the current connections for the seasion and whether or not the connection is active

Host | Servername : Show the machine name or IP address of the currently active server.

Port : Show the currently active port number for the current server.

StatsFile | LogFile : Show the name of the current transaction log file.

QueryFormat | QueryType : show the type code for the current query format.

PreferredMessageSize : Show the preferred message size.

iMaxRecSize | exceptionalRecordSize : Show the maximum record size permitted.

sSmallSetUpperBound | SmallSetUpperBound : Show the Z39.50 Search SmallSetUpperBound parameter

sLargeSetLowerBound | LargeSetLowerBound : Show the Z39.50 Search LargeSetLowerBound parameter

sMediumSetPresentNum | MediumSetPresentNumber : Show the Z39.50 Search MediumSetPresentNumber parameter

ReplaceIndicator : Show whether or not resultsets can be replaced on reuse of a resultsetname.

ResultSetName | pResultSetid | sResultSetName : Show the current resultsetname.

sSmallSetElementSetNames : (this won't work right now) sMediumSetElementSetNames : (this won't work right now)

PreferredRecordSyntax : Show the requested record syntax

Query : Show the latest query AttributeSet : Show the attribute set used for the latest query. pResultSetStartPoint | nextResultSetPosition : Show the Z39.50 Search NextResultSetPosition parameter.

pNumRecsReq | numberOfRecordsRequested : Show the Z39.50 Present NumberOfRecordsRequested.

pElementSetNames | ElementSetNames : Show the Z39.50 Present Elementsetnames


Zset ParameterName {value}
ZSET ParameterName {value}
zset ParameterName {value}

Set parameters that affect the Z39.50 connection, search, and present/display. The parameters that may be set are described below:

session {connectionIDnumber} : Switch to another connection currently active. (Multiple zselects can be used to start multiple simultaneous Z39.50 connections and this command can be used to switch between them.

ResultSetName | pResultSetid | sResultSetName {resultsetname} : Sets the

ResultSetName | sResultSetName | pResultSetid {resultsetname} : Set the resultset name to be used in search (zfind) and present (zdisplay) commands.

Database | Databasenames | Databasename {databasename} : Sets the name of the database to be searched in subsequent zfind commands.

ElementSet | ElementSetNames {name} : Set the elementsetname to be used in records in zdisplay commands (default is F for Full records).
For Cheshire Servers it is possible to set two special elementset name values "XML_ELEMENT_..." and "STRING_SEGMENT_...".
The first form includes an XPATH (subset) specification of the XML/SGML tags to extract from a record -- this may include limited regular expressions. see the section below on XML_ELEMENT specifications.
The second form includes a specification of a substring or a single SGML/XML tag and only that segment of the record will be returned, see the section below on STRING_SEGMENT specifications

PreferredRecordSyntax | pPreferredRecordSyntax | sPreferredRecordSyntax | RecordSyntax | pRecordSyntax | RecordSyntax | RecordFormat | RecFormat sRecordSyntax | RecSyntax {syntaxname} : Set the (suggested) record syntax for the items to be returned from the server. Acceptable recsyntaxes are MARC, SUTRS, SGML, XML, OPAC, EXPLAIN, SUMMARY, GSR0, GENERIC, or ES.

AttributeSet | Attributes {name or OID} : Set the attribute set to be used in searching. Acceptable attribute set names (mnemonics) are BIB1, EXPLAIN, EXT (extended services), CCL (common command Language, GILS (goverment information Locator Service), or STAS (Scientific and Technical attribute Set). Can use attribute set OIDs in place of names. Host {servername} : Can be used the same way as zselect to create a new connection to a host/

QueryFormat {format} : Can be used to set the query type to be processed. The format can be "RPN" or "1", "CCL" or "100", "ISO8777" or "2", "ERPN" or "101", "RANKED" or "102", "SQL" or "0", the default is RPN.

PreferredMessageSize | ipreferredMsgSize | ipreferredMessageSize | preferredMsgSize {size} : Set the preferred message size parameter for an INIT.

exceptionalRecordSize | imaxRecSize | maxRecSize | maxRecordSize {size} : Sets the maximum record size parameter for the INIT.

sSmallSetUpperBound | SmallSetUpperBound {number} : Sets the Z39.50 Search SmallSetUpperBound parameter

sLargeSetLowerBound | LargeSetLowerBound {number} : Sets the Z39.50 Search LargeSetLowerBound parameter

sMediumSetPresentNum | MediumSetPresentNumber {number} : Sets the Z39.50 Search MediumSetPresentNumber parameter

ReplaceIndicator | sReplaceIndicator {0 or 1} : Sets whether or not resultsets can be replaced on reuse of a resultsetname.

ResultSetStartPoint | StartPosition {number} : Sets position for the next item to be retrieved by a zdisplay (it is overridden by any parameters supplied to zdisplay)

NumrecsRequested | NumRequested | NumRecs | NumberOfRecordsRequested {number} : Sets the number of records to be retrieved by the next zdisplay (overridden by parameters to the zdisplay command)

pElementSetNames | ElementSetNames {elementsetname} : Sets elementset name to be used in the next present.

logging | log | logs {(on | 1) | (off | 0)} : Turns logging of transactions on or off.


Zclose
ZCLOSE
zclose

Close the currently active connection to a Z39.50 server.


Zql sql_select_statement
ZQL sql_select_statement
zql sql_select_statement

This command submits the SQL statement provided as the argument(s) to the current server as a Z39.50 Type 0 query. This assumes that the server accepts type 0 queries and processes them against the currently connected database with server-side SQL parsing. The Cheshire server can handle such queries for a RDBMS type configuration file entry (by passing things through to the RDBMS itself). For the webcheshire client the "SQL" or "LSQL" verb followed by an SQL statement will function in the same way for searching local DBMS databases. Note also that in the local webcheshire "SQL" version ANY SQL statement may follow the command verb. This means that the underlying relational databases may be created, and modified as well as queried using this commands (assuming permissions on the DBMS itself permit these operations). The ZQL command proper, however is only set up to permit SELECT operations and not database modification.


Zformat formatname record rectype [recnumber] [max_line_length] [DTD_filename]
ZFORMAT formatname record rectype [recnumber] [max_line_length] [DTD_filename]
zformat formatname record rectype [recnumber] [max_line_length] [DTD_filename]

This command is used to provide special formatting on the client side for MARC and some SGML records.

formatname:

The format names for MARC are:
"FULL" or "LONG" or"TAGGED" for a full records with tagged fields,
"BRIEF" or "SHORT" for an abbreviated record with tagged fields,
"MARC" or "FULLMARC" for a full MARC records tagged using MARC field numbers,
"REVIEW" or "EVAL" for very short records,
"LIST" or "TCLLIST" for records structured as a TCL list.,
"HTML" for full records tagged as HTML,
"SHORTHTML" abbreviated records tagged as HTML,
"REVIEWHTML" for very short records tagged as HTML.

The format names for SGML are:
For items using the USMARC DTD
"REVIEW" for very short records,
"SHORT" for an abbreviated record with tagged fields,
"LONG" for full records with tagged fields,
"MARC" for a full MARC records tagged using MARC field numbers,
"HTMLREVIEW" for very short records tagged as HTML,
"HTMLSHORT" abbreviated records tagged as HTML,
"HTMLLONG" for full records tagged as HTML
"CSMP_HTMLREVIEW" for very short records tagged as HTML, with 860 URL fields set up as hyperlinks.
"CSMP_HTMLSHORT" abbreviated records tagged as HTML, with 860 URL fields set up as hyperlinks.
"CSMP_HTMLLONG" for full records tagged as HTML with 860 URL fields set up as hyperlinks.
"GLAS_HTMLREVIEW" for very short records tagged as HTML (with special tags),
"GLAS_HTMLSHORT" abbreviated records tagged as HTML (with special tags),
"GLAS_HTMLLONG" for full records tagged as HTML.

For items using the standard "classcluster" DTD (lccclust) there is the "LCCSHORT" format.

For items using the TREC FT DTD there are three formats, "REVIEW", "SHORT", and "LONG".

record: The full record to be formatted

rectype: This is the type of record it can be expressed as a name or a type number. These are: "marc" or "usmarc" or 1,"sgmlmarc" or 2,"sgml" or 5,"opac" or 3,"text" or "sutrs" or 4,"generic" or 6,"explain" or 7.

recnumber: This is the sequence or id number to use for the formatted record.

max_line_length: This is maximum number of characters to permit on a line.

DTD_filename: For SGML (rectype 5) this is the name of the file containing the DTD for this record type.


ZMakeFormat formatname [DTDNAME] {{list_of_format_elements}}
ZMAKEFORMAT formatname [DTDNAME] {{list_of_format_elements}}
zmakeformat formatname [DTDNAME] {{list_of_format_elements}}

This command adds a new format type to the builtin set of formats (see ZFORMAT above). The DTDNAME parameter is required for SGML formats. The list of format elements is a list of tcl lists that has the following structure:

{{label[500]} {tags[200]} {subfields[200]} {beginpunct[200]} {subfsep[200]} {endpunct[200]} {newfield [TRUE|FALSE|-1]} {print_all [TRUE|FALSE]} {print_indicators [TRUE|FALSE]} {print_delimiters[TRUE|FALSE]} {repeatlabel[TRUE|FALSE]} {multisubstitute[TRUE|FALSE]} {indent[NUMBER]}}

The number in square brackets is the maximum size for the element, or the permitted values for the element. These elements are:

label: The label to place before the field in the formatted record.

tags: Which MARC or SGML tags this format element applies to (may use ?as a a single character wildcard for MARC). If "#" is used instead of a tag, then the record number supplied as a parameter to zformat will be used as the item to be formatted.

subfields: Which subfields of the tags the element applies to ( For MARC, empty means all subfields, otherwise only the listed subfields are formatted) (For SGML, all subfields is indicated by "*" , and subfield names must be separated by spaces)

beginpunct: String to place before the field in the formatted record.

subfsep: String to place between subfields in the formatted record.

endpunct: String to place after the field in the formatted record.

newfield: Is this a new field (should always be true except for end of format records where it should be -1). (Ignored in SGML)

print_all: Print the entire field? (minimal formatting).(Ignored in SGML)

print_indicators: Print the MARC indicators before the field? (Ignored in SGML)

print_delimiters: Print the MARC delimiters? (Usually FALSE). (Ignored in SGML)

repeatlabel: Repeat the label string for each line or matching field? (Ignored in SGML)

multisubstitute: Substitute the field for '%' in the beginpunct string? (Ignored in SGML)

indent: Number of spaces to indent wrapped fields. (Ignored in SGML)

Here is an example for a MARC format:

zmakeformat SHORT {{{Record #} {#} {} {} { } {
} TRUE FALSE FALSE FALSE FALSE FALSE 0} {{Author:} {1??} {} {} { } {.
} TRUE FALSE FALSE FALSE FALSE FALSE 15} {{Title:} {245} {} {} { } {.
} TRUE FALSE FALSE FALSE FALSE FALSE 15} {{Publisher:} {260} {ab} {} { } {.
} TRUE FALSE FALSE FALSE FALSE FALSE 15} {{Date:} {260} {c} {} { } {.
} TRUE FALSE FALSE FALSE FALSE FALSE 15} {{Periodical:} {773} {} {} {} {.
} TRUE FALSE FALSE FALSE FALSE FALSE 15} {{Subjects:} {6[59]0} {} {} { -- } {.
} TRUE FALSE FALSE FALSE FALSE FALSE 15} {{LC Call No.:} {050} {} {} { } {
} TRUE FALSE FALSE FALSE FALSE FALSE 15}}

Which would produce a formatted MARC record that looks like this:

Record #1 
Author: Hatcher, William S.
Title: The logical foundations of mathematics / by William S. Hatcher.
Publisher: Oxford ; New York : Pergamon Press.
Date: 1968.
Subjects: Mathematics -- Philosophy.


ZRemoveFormat formatname
ZREMOVEFORMAT formatname
zremoveformat formatname

This command removes a format created with the zmakeformat command described above.


ZShowFormat formatname
ZSHOWFORMAT formatname
zshowformat formatname

This command returns a format element list for builtin formats or formats created with the zmakeformat command described above.


Zdelete ALL | ResultSetName1 [ResultSetName2 ...]
ZDELETE ALL | ResultSetName1 [ResultSetName2 ...]
zdelete ALL | ResultSetName1 [ResultSetName2 ...] Deletes named result set(s) stored on the Z39.50 server. The keyword "ALL" deletes all stored result sets.


pTmpNam directoryname

Creates a unique random file name for temporary file usage.


ZSort
ZSORT
zsort
local_sort
LS
cheshire_sort

Sort result sets. This command will sort results, the "Z" forms of the command create a Z39.50 Sort request and send it to the target, the other forms are used in webcheshire for local sorting of resultsets. The parameters and flags discussed below apply to both versions.

The additional arguments to the ZSort command are:

-IN_RESULTS { list_of_input_resultsetnames}
These are the result sets (or a single set) to be merged and sorted -- Note that merging will only work for resultsets from the same database. The braces are required if there are more than one input resultsets, they should be structured as a Tcl list. Duplicate entries are removed when multiple resultsets are merged. If this flag is NOT specified then the current resultset from the most recent search is used.
-OUT_RESULTS Output_resultsetname
This is the name to be used for the output resultset. If this flag is NOT specified then the current resultset for the session is used (the sorted version will replace the current resultset).
-TAG //sgml/xml_tagpath
This specifies a tag (or attribute) in the SGML/XML documents as a sort key. The tags are specified using a path notation (based on XPath abbreviated version, with some modifications. Basically a tag can be a single tag name, a regular expression (as in Cheshire configfile FTAG specifications), or a sequence of tag names separated by slashes. Attributes may be specified by preceding the attribute name with an at-sign '@'. For example,
chapter/para/sentence
asks for all sentence tags that are descendents of para element which are descendents of chapter elements. There is no need to specify the FULL path as long the nesting is sufficient to select the correct elements. A single tag name alone will find that tag anywhere in the document. Regular expressions can be used in tag names to specify combinations of tags for example "^fld1[1234].*" would select any tag that began with "fld1" followed by a 1, 2, 3, or 4, followed by any number of characters. Attributes may be specified in two ways. A path like:

/para/sentence{@runon}

would select as a sort key the runon attributes of the sentence tags that are descendents of para elements. The following:

/para/sentence/@runon

would do exactly the same selection. The values of attributes may be used as a criteria for key selection as well. For example:

/para/sentence{@runon="not"}

would select all sentence elements where the attribute runon had the value "not" -- in this case the element contents instead of the attribute values are used for the sort keys.

-ATTRIBUTE attribute
An alternative way to specify sorting elements is to use the attributes of the database. These can be specified in the same way as the index elements of searches (see above). The parts of the database records that where used to create the indexes corresponding to the attribute will be used to extract the sort key. For example "-attribute title" would use the same rules used in creating the title index of the database for extracting the sort key.
-ELEMENTSETNAME setname
There are currently only two elementsetnames defined for sorting, "RANK" and "SCORE" and they are synonyms. When they are specified the sort key will be based on the ranking values of the resultset for each document. For resultsets from probabilistic searches this is the probability of relevance.
-IGNORE_CASE
Case is ignored in sorting for this part of the sort key. This must follow a -TAG, -ATTRIBUTE or -ELEMENT specification and applies to the keys defined by that specification. This is a default, and will be used if NOT specified
-CASE_SENSITIVE
Case matters in sorting for this part of the sort key. This must follow a -TAG, -ATTRIBUTE or -ELEMENT specification and applies to the keys defined by that specification.
-ASCENDING
This flag indicates that this part of the sort key should be sorted in ascending order (low to high). It must follow a -TAG, -ATTRIBUTE or -ELEMENT specification and applies to the keys defined by that specification. This is a default, and will be used if NOT specified
-DESCENDING
This flag indicates that this part of the sort key should be sorted in descending order (high to low)
-ASCENDING_FREQ
This flag indicates that this part of the sort key should be sorted in ascending order (low to high) by the frequency of the key value (I think). This is transmitted in Z39.50, but not supported on cheshire servers.
-DESCENDING_FREQ
This flag indicates that this part of the sort key should be sorted in descending order (high to low) by the frequency of the key value (I think). This is transmitted in Z39.50, but not supported on cheshire servers.
-MISSING_NULL
This flag indicates that missing values for this part of the sort key should be treated a NULL values. It must follow a -TAG, -ATTRIBUTE or -ELEMENT specification and applies to the keys defined by that specification. This is a default, and will be used if NOT specified
-MISSING_QUIT
This flag indicates that missing values for this part of the sort key should cancel the sort, with no results returned. It must follow a -TAG, -ATTRIBUTE or -ELEMENT specification and applies to the keys defined by that specification.
-MISSING_VALUE "value"
This flag indicates that missing values for this part of the sort key should be replaced in the sorting by the supplied value. It must follow a -TAG, -ATTRIBUTE or -ELEMENT specification and applies to the keys defined by that specification.
Defaults are, again, -IGNORE_CASE -ASCENDING -MISSING_NULL. Note that any of the flags may be abbreviated (as long as they are unique abbreviations). A sort specification can have up to 100 sort keys specified. For example:

zsort -attr title -attr author -missing_value ZZZZZZ

Would sort the current resultset by title and author (with missing authors treated as "ZZZZZZ").

zsort -in old -out new -tag section/subject-heading -attr title

would sort contents of resultset "old" by the contents of the subject-heading tags within section elements, and the title attribute contents, with the result put into a new resultset "new".

zsort -attr date -desc -attr title -asc -case

would sort (the current resultset) from newest to oldest dates, and by title within each date with case sensitive sorting.

Note that sorting is highly dependent on the server and not all servers will support all features (or even support sort at all). The cheshire server supports all of the types of sorts listed above, with the exception of frequency ordering. In the cheshire server if a record has multiple elements that match the criteria (e.g., many subject headings in many section in the example above, then only the first occurrence in the document is used for the sort).


zhighlight
highlight
cheshirehighlight

Highlight Search Terms in data. The basic Tcl command syntax is:
highlight <-stem> "search word string" "data string" "pre-string" "post-string"

The "data string" is searched for each occurrence of words (or stems) in the "search word string" (the search string is assumed to follow the syntax of a cheshire search command, but may also just be a string of words, in which case the first word is ignored). For each word (or stem) found, the "pre-string" is inserted before it, and the "post-string" is inserted after it (or after the stem if the "-stem" options is specified. Matching for either words or stems is not case-sensitive. Aliases for "highlight" are "zhighlight" and "cheshirehighlight". For example, suppose a search has been done for "zfind su mathematics" in a cheshire database, if the formatted record (rec) looks like:

Title:         Essays in statistical science : papers in honor of
                  P.A.P. Moran / edited by J. Gani and E.J. Hannan.
Publisher:     Sheffield, Eng. : Applied Probability Trust.
Date:          1982.
Pages:         434 p. : ill..
Notes:         "Journal of applied probability special volume ;
                  v.19A"
               Includes bibliographies and index
               Publications of P.A.P. Moran: p. 1-6
Subjects:      Mathematical statistics.
               Statistics.
               Stochastic processes.
Other Authors: Moran, P. A. P. (Patrick Alfred Pierce), 1917.
               Hannan, E. J. (Edward James), 1921.
               Gani, J. M. (Joseph Mark).

Then the Tcl statement:

set result [highlight -stem "zfind su mathematics" $rec "<START>" "<END>"]

would set result to:

Title:         Essays in statistical science : papers in honor of
                  P.A.P. Moran / edited by J. Gani and E.J. Hannan.
Publisher:     Sheffield, Eng. : Applied Probability Trust.
Date:          1982.
Pages:         434 p. : ill..
Notes:         "Journal of applied probability special volume ;
                  v.19A"
               Includes bibliographies and index
               Publications of P.A.P. Moran: p. 1-6
Subjects:      <START>Mathemat<END>ical statistics.
               Statistics.
               Stochastic processes.
Other Authors: Moran, P. A. P. (Patrick Alfred Pierce), 1917.
               Hannan, E. J. (Edward James), 1921.
               Gani, J. M. (Joseph Mark).

Note that this is a Client-side operation and the highlighted items will NOT be restricted to the occurrences items actually indexed by the index used in searching, but it will highlight any occurrence in the string. If the "-stem" option is not used, then the highlighted occurrences must match the entire word as specified in the search string. Note however that this will work with ANY search string and ANY data string, so it can be used anywhere. For example...

% set query "zfind su THIS IS A TEST"
zfind su THIS IS A TEST
% set data "This is the data that we are testing on."
This is the data that we are testing on.
% set result [highlight $query $data "<B>" "</B>"]
<B>This</B> <B>is</B> the data that we are testing on.
% set result [highlight -stem $query $data "<B>" "</B>"]
<B>Thi</B>s <B>is</B> the data that we are <B>test</B>ing on.

Remember that syntactical parts of a query (like zfind, index names, and Boolean operators) are ignored in highlight processing, so that a query like:
% set query "search title data and author test"
search title data and author test

would give the following when applied to the same data as above:
% set result [highlight $query $data "<B>" "</B>"]
This is the <B>data</B> that we are testing on.
% set result [highlight -stem $query $data "<B>" "</B>"]
This is the <B>data</B> that we are <B>test</B>ing on.


pTranLog transaction_code

Outputs a transaction log record (see cheshire2/pTranLog.c for details). Intended for use with the Tcl/Tk X window client and the opac script.



sResults

Collects and outputs results from a questionaire (see cheshire2/sResults.c for details. Intended for use with the Tcl/Tk X window client and the questionnaire script.


cheshire_search indexname1 [RELOP] search_string1 [[BOOLOP] indexname2 [RELOP] search_string2 [BOOLOP2]... [resultsetid id_string]

search indexname1 [RELOP] search_string1 [[BOOLOP] indexname2 [RELOP] search_string2 [BOOLOP2]... [resultsetid id_string]

LFIND indexname1 [RELOP] search_string1 [[BOOLOP] indexname2 [RELOP] search_string2 [BOOLOP2]... [resultsetid id_string]

lfind indexname1 [RELOP] search_string1 [[BOOLOP] indexname2 [RELOP] search_string2 [BOOLOP2]... [resultsetid id_string]

This command functions the same way as ZFIND above, however it is used only in the webcheshire or staffcheshire client programs to search the local database instead of using z39.50 to search remote databases. Some Tcl variables must be set to enable use of this command. The variables are:

CHESHIRE_CONFIGFILE: This should be set to the full pathname of the configuration file for the database.

CHESHIRE_DATABASE: This should be set to the name of the database (file) to be searched.

CHESHIRE_NUMREQUESTED: This should be set to the maximum number of records to be returned by the search.

CHESHIRE_NUM_START: This should be set to the number of the record in the search resultset that should be the first record returned.

CHESHIRE_ELEMENTSET: This should be set to the elementset name (or OID) that should be used in constructing the results of this search.

CHESHIRE_RECSYNTAX: This should be set to the record syntax name (or OID) that should be used in constructing the results of this search. (e.g.; GRS1, SUTRS, XML, SGML).

CHESHIRE_ATTRIBUTESET: This should set to the attributeset to be used in parsing the query.

CHESHIRE_LOGFILE: This should set to the pathname of a file to used in logging any errors in query processing -- defaults to STDERR.

CHESHIRE_RETURN_PAGEDOCS: When searching a "PAGED_DIRECTORY_REF" type of index, this forces the search to return a constructed document record with the page references attached instead of page records. (See configuration file documentation for more information on PAGE_DIRECTORY_REF indexes).


Cheshire_Fetch resultset_name number_to_fetch start_position

cheshire_fetch resultset_name number_to_fetch start_position

FETCH resultset_name number_to_fetch start_position

fetch resultset_name number_to_fetch start_position

fetch_result resultset_name number_to_fetch start_position

fetch_results resultset_name number_to_fetch start_position

The Cheshire_Fetch command is used to retrieve and format records from stored resultsets. It is only available in webcheshire, and is the analog of zdisplay for retrieving from local resultsets. Since resultsets are not stored by default in webcheshire, they must be explicitly created using the RESULTSETID option in search commands. The optional number_to_fetch and start_position (both required if used) should be numbers (again same as in zdisplay). If these two are not specified the currently set values for CHESHIRE_NUMREQUESTED and CHESHIRE_NUM_START will be used. If these are not set and the number_to_fetch and start_position values are not specified, then an error will result. All of the other global variables discussed above in Cheshire_Search have the same effects in Cheshire_Fetch.


TileBar_Search index_name {concept1 terms} {concept2 terms} [elib_id]

tilebar_search index_name {concept1 terms} {concept2 terms} [elib_id]

bfind index_name {concept1 terms} {concept2 terms} [elib_id]

tbsearch index_name {concept1 terms} {concept2 terms} [elib_id]

tb_search index_name {concept1 terms} {concept2 terms} [elib_id]

TileBar_Search implements a search for two sets of concepts/terms from a "PAGED_DIRECTORY_REF" type of index. These are then returned as a Tcl list in a format for presentation via the DLIB tilebars java client. This command requires the use of the same Tcl variables as the cheshire_search command above.


cheshire_close
cheshire_exit
close_cheshire
exit_cheshire

This command properly closes all open files named in the configuration files. To ensure that all updates to files and indexes are properly made this command must be executed before exiting from the client program. (needed for StaffCheshire and WebCheshire only)


LCCBuild LCC_data_file_name
LCCBUILD LCC_data_file_name
lccbuild LCC_data_file_name

This routine takes a file of information about the Library of Congress Classification Scheme and builds an internal table to provide a hierarchical "description" of any LCC class number. The data file should contain lines that provide a heading for each level in the LCC hierarchy. The lines should contain either a single class or a range of LCC class values, followed by a colon, followed by the description of the class or range of classes. Each line should be indented using tab characters, with each tab representing a level in the LCC hierarchy. The following shows an example drawn from the lc_outline.text data file included in the "doc" directory of the cheshire distribution.

A-ZZZ: Library of Congress Classification Topic: 
\\t A-AZ: General works.
\\t\\t AC: Collections. Series. Collected Works.
\\t\\t AE: Encyclopaedias (General).
\\t\\t AG: Dictionaries and other general reference books.
\\t\\t AI: Indexes (General).

or

\\t E-FZ: History: America.
\\t\\t E:
\\t\\t\\t 11-29: (General)
\\t\\t\\t 31-46: North America.
\\t\\t\\t 51-99: Indians. Indians of North America.
\\t\\t\\t 101-135: Discovery of America and early explorations
\\t\\t\\t 151-9999: United States (General).
\\t\\t\\t\\t 184-185.98: Elements in the population.
\\t\\t\\t\\t\\t 184.5-185.98: Afro-Americans.


LCCGet "alphapart numericpart"
LCCGET "alphapart numericpart"
lccget "alphapart numericpart"

This routine uses an internal table of information about the Library of Congress Classification Scheme (loaded using the LCCBuild command) to provide a hierarchical "description" of any class number indicated by the alphabetic main class,alphapart, and numeric subclass, numericpart. Each element of the returned string is separated by an asterisk and represents a lower level of the hierarchy. These strings can be turned into Tcl lists using the Tcl "split" command on the asterisks. The first element of all the strings are the same, representing the root of the hierarchy. For example:

% lccget QA 76

*Library of Congress Classification Topic:*Science.*Mathematics*Computer Science. Electronic data processing.

% lccget z 699

*Library of Congress Classification Topic:*Bibliography*Libraries.*Library science. Information science.*The collections. The books.*Machine methods of information storage and retrieval. Mechanized bibliographic control.


LCCDestroy
LCCDESTROY
lccdestroy

This routine removes the internal table of information about the Library of Congress Classification Scheme loaded using the LCCBuild command.



XML_ELEMENT elementset specifications

XML or SGML elements may be extracted dynamically from the records in a database using the following format specification in the config file (assuming XML output is wanted):

<displaydef name="XML_ELEMENT_" OID="1.2.840.10003.5.109.10">
<convert function="XML_ELEMENT">
<clusmap>
<from>
<tagspec>
<ftag> SUBST_ELEMENT </ftag>
</tagspec>
</from>
<to>
<tagspec>
<ftag> SUBST_ELEMENT </ftag>
</tagspec>
</to>
</clusmap>
</convert>
</displaydef>

This format is used in querying by setting the elementsetname to "XML_ELEMENT_xxx" where the "xxx" is the XPATH name of element in the records which can be specified by a simplified XPATH string (only direct paths ( /a/b/c/ etc.) are supported and not the xpath keyword specifications for relative paths. For example, using the displaydef above, and if a single tag is wanted, then just the tag name is needed.

For example, assuming the above displaydef is defined for the example bibfile database (see index/testconfig.new), then sending the following commands to the client:

% zset recsyntax xml
% zset elementset "XML_ELEMENT_Fld245"
% zfind su mathematics
{OK {Status 1} {Hits 17} {Received 0} {Set Default} {RecordSyntax UNKNOWN}}
% zdisplay

Will result in...

{OK {Status 0} {Received 10} {Position 1} {Set Default} {NextPosition 11} {RecordSyntax XML 1.2.840.10003.5.109.10}} {<RESULT_DATA DOCID="1">
<Fld245 AddEnty="No" NFChars="0"><a>Singularitâes áa Cargáese</a></Fld245>
</RESULT_DATA>
} {<RESULT_DATA DOCID="2">
<Fld245 AddEnty="Yes" NFChars="0"><a>Modáeles locaux de champs et de formes /</a><c>Robert Roussarie</c></Fld245>
</RESULT_DATA>
} {<RESULT_DATA DOCID="5">
<Fld245 AddEnty="No" NFChars="0"><a>Metody modelirovaniëiìa i obrabotka informaëtìsii /</a><c>otv. redaktory K.A. Bagrinovskiæi, E.L. Berlëiìand</c></Fld245>
</RESULT_DATA>

Notice that the extra tag <RESULT_DATA> has been added to each record (the DOCID attribute is the internal document ID for the source record).

Thus, any XML/SGML element can be requested from the database records of a database with this display format defined.

For more complete paths, e.g.:

% zset elementset "XML_ELEMENT_/USMARC/VarFlds/Titles/Fld245"

note that the path need not be a complete path, as long as the subordinate path elements are descendents of the superordinate ones, the path can be matched. Also, if a set of elements is wanted from a record, these may be specified using the XPATH "|" notation, for example

% zset elementset "XML_ELEMENT_Fld245|Fld650|Fld651

would retrieve all Fld245, Fld650 and Fld651 tags from the record (so it isn't -really- single element extraction at all).

Note ALSO that because the XPATH notation is converted into TAGSPECs internally, all of the wildcard and pattern matching available in configfile TAGSPECs is available in the XPATH specifications (this is NOT, however, guaranteed to be a real XPATH wildcards implementation).

For example, in place of the above example

% zset elementset "XML_ELEMENT_Fld245|^Fld65."

could be used to match Fld245 along with any tag starting with "Fld65" followed by any other character.

There is also now support for attribute (and attribute+value) specifications using XPATH. For example:

% zset elementset XML_ELEMENT_Fld245/@AddEnty

would retrieve just the AddEnty attribute values for the Fld245 tag, and

% zset elementset XML_ELEMENT_Fld245/@AddEnty=No

would return just Fld245's that had the attribute AddEnty with the value "No".

However, the combination of full/partial paths with regular expressions will usually fail to work (due to the way the paths are turned into TAGSPECs), so the following will NOT work correctly...

% zset elementset "XML_ELEMENT_TITLES/Fld245|SUBJECTS/^Fld65."

This would be interpreted as the FTAG path...

<FTAG>TITLES</FTAG><s>Fld245|SUBJECTS</s><s>^Fld65.</s>

Which would not match any tags in a correctly constructed USMARC record.

Please also note that this is just a display format for records retrieved by searches and is not an additional search -- If no element specified in the XPATH specification is found in a retrieved record an empty record will be returned. These empty records look like:

<RESULT_DATA DOCID="74"></RESULT_DATA>

which implies that the record with DOCID 74 matched the query, but did not have any fields matching the XPATH specification.

The records created using this method now include the full XPATH for each item extracted. The new results (for a XML_ELEMENT_Fld650 elementset specification from a USMARC DTD database) look like:

...

<RESULT_DATA DOCID="2">
<ITEM XPATH="/USMARC[1]/VarFlds[1]/VarDFlds[1]/SubjAccs[1]/Fld650[1]">
<Fld650 SubjLvl="NoInfo" SubjSys="LCSH"><a>Vector algebra.</a></Fld650>
</ITEM>
<ITEM XPATH="/USMARC[1]/VarFlds[1]/VarDFlds[1]/SubjAccs[1]/Fld650[2]">
<Fld650 SubjLvl="NoInfo" SubjSys="LCSH"><a>Differential forms.</a></Fld650>
</ITEM>
<ITEM XPATH="/USMARC[1]/VarFlds[1]/VarDFlds[1]/SubjAccs[1]/Fld650[3]">
<Fld650 SubjLvl="NoInfo" SubjSys="LCSH"><a>Singularities (Mathematics)</a></Fld650>
</ITEM>
<ITEM XPATH="/USMARC[1]/VarFlds[1]/VarDFlds[1]/SubjAccs[1]/Fld650[4]">
<Fld650 SubjLvl="NoInfo" SubjSys="LCSH"><a>Differential equations.</a></Fld650>
</ITEM>
</RESULT_DATA>
...

There is now an ITEM element for each matching document element (which is included as a subelement of the ITEM element). The XPATH attribute of the element is XPATH for the element, including the sequence number of sibling elements when they have the same parent path.

In addition, the XPATH specifications for the element wanted can ALSO include occurrence numbers, which are used to restrict the fields returned for example for XML_ELEMENT_Fld650[3] the same record as above would only return the third occurrence of the subject:

<RESULT_DATA DOCID="2">
<ITEM XPATH="/USMARC[1]/VarFlds[1]/VarDFlds[1]/SubjAccs[1]/Fld650[3]">
<Fld650 SubjLvl="NoInfo" SubjSys="LCSH"><a>Singularities (Mathematics)</a></Fld650>
</ITEM>
</RESULT_DATA>

This capability gives fairly powerful control over the display elements extracted.


STRING_SEGMENT_ elementset specifications

In addition the above, there is another "display format" that does not require a specification in the config files. It is "STRING_SEGMENT_..." treated similarly to the XML_ELEMENT_ elementset specifications as an elementsetname. The purpose is to exact strings from the underlying SGML/XML data of a Cheshire database without having to do parsing of the records. This will work only when the record syntax requested is XML, SGML or SUTRS. The basic forms are:

zset elementsetname STRING_SEGMENT_400
(for Z connections) or
set CHESHIRE_ELEMENTSET STRING_SEGMENT_400
(for webcheshire local retrieval)

zset elementsetname STRING_SEGMENT_200_400
(for Z connections) or
set CHESHIRE_ELEMENTSET STRING_SEGMENT_200_400
(for webcheshire local retrieval)

where 400 is the END position of the part of the record that you want to get and 200 is the START position, the second form assumes that start position is the beginning of the record (char position 0). Unlike the XML_ELEMENT_... definitions above, this does NOT require an DISPLAYDEF -- (i.e. it should work for any records).

Alternatively, the following form can be used to extract the FIRST matching SGML/XML tag in a record...

zset elementsetname STRING_SEGMENT_Fld245
(for Z connections) or
set CHESHIRE_ELEMENTSET STRING_SEGMENT_Fld245
(for webcheshire local retrieval)

To extract the first occurrence of the tag Fld245 -- note that ONLY a single tag and NOT an xpath can be used with this, since it is not parsing the record, but doing simple string matching for the first occurrence of the tag. If a record doesn't have the tag anyplace, it returns the string "*** NO MATCHING TAGS IN MATCHING RECORD ***" in place of the tag. Note also that it is entirely possible to return string values that will not be valid XML or SGML, it is up to the user/scripter to take appropriate action when using this type of retrieval. The primary advantages are 1) No parsing is done so it is fast to return results and 2) arbitrary pieces of the records can be extracted. Of course, even though you can send such an elementset name to any Z server, only cheshire servers will be able to process it.

To aid in using the results returned by this method for IR evaluations, STRING_SEGMENT_ puts the following at the head of every string:
"docid 9999|rank 9999|relv 9999|rawrel 9999.9999|..."
where the "9999" is replaced by the appropriate values from the search. This can be easily processed using the Tcl split and list handling commands in a script.


set CHESHIRE_SEARCH_STAT_DUMP 1
set CHESHIRE_SEARCH_STAT_DUMP 0

Setting this variable in webcheshire or staffcheshire causes statistics about ranking to be collected and output for each ranked search query. The output is appended to the results returned as a set of lines, one for each matching document in the collection NOTE that the stats output includes entries for ALL matching records, not just the number requested by the search (CHESHIRE_NUMREQUESTED). The variables returned in each line (each matching document) are:

Variables

Description

docid Cheshire internal document ID number
compid Cheshire internal component ID number
doclen Document length (in bytes)
qlen Query length
nmterms number of matching terms between document and query
ndocs number of documents
distndoc number of documents (for distributed apps)
min_cf mininum collection frequency (over all terms in query)
max_cf maximum collection frequency (over all terms in query)
min_tf mininum document term frequency (for this query)
max_tf maximum document term frequency
sum_entr total doc/comp matches in index
min_entr mininum doc/comp matches in index
max_entr maximun doc/comp matches in index
X1 PROB: X1 - Okapi: Sum of RSJ values for terms - CORI: Sum of I
X2 PROB: X2 - Okapi: Sum of document term frequency - CORI: Sum of T
X3 PROB: X3 - Okapi & CORI: average document length
X4 PROB: X4 - Okapi: Constant k1 - CORI: 0
X5 PROB: X3 - Okapi: Constant k3 - CORI: 0
X6 PROB: X3 - Okapi: Constant b - CORI: 0
logodds PROB: logodds value - Okapi & CORI: 0.0
docwt RSV/Probability for this document
$compname Component name (if component)


BUGS

None known -- but there may be undesireable features :-)

SEE ALSO Tcl(1), Tk(1)

AUTHOR

Ray R. Larson ( )