zebrasrv - Zebra Server

NAME

       zebrasrv - Zebra Server

SYNOPSIS

       zebrasrv [-install] [-installa] [-remove] [-a file] [-v level]
                [-l file] [-u uid] [-c config] [-f vconfig] [-C fname]
                [-t minutes] [-k kilobytes] [-d daemon] [-w dir] [-p pidfile]
                [-ziDST1] [listener-spec...]

DESCRIPTION

       Zebra is a high-performance, general-purpose structured text indexing
       and retrieval engine. It reads structured records in a variety of input
       formats (e.g. email, XML, MARC) and allows access to them through exact
       boolean search expressions and relevance-ranked free-text queries.

       zebrasrv is the Z39.50 and SRU frontend server for the Zebra search
       engine and indexer.

       On Unix you can run the zebrasrv server from the command line - and put
       it in the background. It may also operate under the inet daemon. On
       WIN32 you can run the server as a console application or as a WIN32
       Service.

OPTIONS

       The options for zebrasrv are the same as those for YAZ' yaz-ztest.
       Option -c specifies a Zebra configuration file - if omitted zebra.cfg
       is read.

       -a file
           Specify a file for dumping PDUs (for diagnostic purposes). The
           special name - (dash) sends output to stderr.

       -S
           Don't fork or make threads on connection requests. This is good for
           debugging, but not recommended for real operation: Although the
           server is asynchronous and non-blocking, it can be nice to keep a
           software malfunction (okay then, a crash) from affecting all
           current users. The server can only accept a single connection in
           this mode.

       -1
           Like -S but after one session the server exits. This mode is for
           debugging only.

       -T
           Operate the server in threaded mode. The server creates a thread
           for each connection rather than a fork a process. Only available on
           UNIX systems that offers POSIX threads.

       -s
           Use the SR protocol (obsolete).

       -z
           Use the Z39.50 protocol (default). This option and -s complement
           each other. You can use both multiple times on the same command
           line, between listener-specifications (see below). This way, you
           can set up the server to listen for connections in both protocols
           concurrently, on different local ports.

       -l file
           Specify an output file for the diagnostic messages. The default is
           to write this information to stderr

       -c config-file
           Read configuration information from config-file. The default
           configuration is ./zebra.cfg

       -f vconfig
           This specifies an XML file that describes one or more YAZ frontend
           virtual servers. See section VIRTUAL HOSTS for details.

       -C fname
           Sets SSL certificate file name for server (PEM).

       -v level
           The log level. Use a comma-separated list of members of the set
           {fatal,debug,warn,log,malloc,all,none}.

       -u uid
           Set user ID. Sets the real UID of the server process to that of the
           given user. It's useful if you aren't comfortable with having the
           server run as root, but you need to start it as such to bind a
           privileged port.

       -w working-directory
           The server changes to this working directory during before
           listening on incoming connections. This option is useful when the
           server is operating from the inetd daemon (see -i).

       -p pidfile
           Specifies that the server should write its Process ID to file given
           by pidfile. A typical location would be /var/run/zebrasrv.pid.

       -i
           Use this to make the the server run from the inetd server (UNIX
           only). Make sure you use the logfile option -l in conjunction with
           this mode and specify the -l option before any other options.

       -D
           Use this to make the server put itself in the background and run as
           a daemon. If neither -i nor -D is given, the server starts in the
           foreground.

       -install
           Use this to install the server as an NT service (Windows NT/2000/XP
           only). Control the server by going to the Services in the Control
           Panel.

       -installa
           Use this to install and activate the server as an NT service
           (Windows NT/2000/XP only). Control the server by going to the
           Services in the Control Panel.

       -remove
           Use this to remove the server from the NT services (Windows
           NT/2000/XP only).

       -t minutes
           Idle session timeout, in minutes. Default is 60 minutes.

       -k size
           Maximum record size/message size, in kilobytes. Default is 1024 KB
           (1 MB).

       -d daemon
           Set name of daemon to be used in hosts access file. See
           hosts_access(5) and tcpd(8).

       A listener-address consists of an optional transport mode followed by a
       colon (:) followed by a listener address. The transport mode is either
       a file system socket unix, a SSL TCP/IP socket ssl, or a plain TCP/IP
       socket tcp (default).

       For TCP, an address has the form

               hostname | IP-number [: portnumber]

       The port number defaults to 210 (standard Z39.50 port) for privileged
       users (root), and 9999 for normal users. The special hostname "@" is
       mapped to the address INADDR_ANY, which causes the server to listen on
       any local interface.

       The default behavior for zebrasrv - if started as non-privileged user -
       is to establish a single TCP/IP listener, for the Z39.50 protocol, on
       port 9999.

               zebrasrv @
               zebrasrv tcp:some.server.name.org:1234
               zebrasrv ssl:@:3000

       To start the server listening on the registered port for Z39.50, or on
       a filesystem socket, and to drop root privileges once the ports are
       bound, execute the server like this from a root shell:

               zebrasrv -u daemon @
               zebrasrv -u daemon tcp:@:210
               zebrasrv -u daemon unix:/some/file/system/socket

       Here daemon is an existing user account, and the unix socket
       /some/file/system/socket is readable and writable for the daemon
       account.

Z39.50 PROTOCOL SUPPORT AND BEHAVIOR

   Z39.50 Initialization
       During initialization, the server will negotiate to version 3 of the
       Z39.50 protocol, and the option bits for Search, Present, Scan,
       NamedResultSets, and concurrentOperations will be set, if requested by
       the client. The maximum PDU size is negotiated down to a maximum of 1
       MB by default.

   Z39.50 Search
       The supported query type are 1 and 101. All operators are currently
       supported with the restriction that only proximity units of type "word"
       are supported for the proximity operator. Queries can be arbitrarily
       complex. Named result sets are supported, and result sets can be used
       as operands without limitations. Searches may span multiple databases.

       The server has full support for piggy-backed retrieval (see also the
       following section).

   Z39.50 Present
       The present facility is supported in a standard fashion. The requested
       record syntax is matched against the ones supported by the profile of
       each record retrieved. If no record syntax is given, SUTRS is the
       default. The requested element set name, again, is matched against any
       provided by the relevant record profiles.

   Z39.50 Scan
       The attribute combinations provided with the termListAndStartPoint are
       processed in the same way as operands in a query (see above).
       Currently, only the term and the globalOccurrences are returned with
       the termInfo structure.

   Z39.50 Sort
       Z39.50 specifies three different types of sort criteria. Of these Zebra
       supports the attribute specification type in which case the use
       attribute specifies the "Sort register". Sort registers are created for
       those fields that are of type "sort" in the default.idx file. The
       corresponding character mapping file in default.idx specifies the
       ordinal of each character used in the actual sort.

       Z39.50 allows the client to specify sorting on one or more input result
       sets and one output result set. Zebra supports sorting on one result
       set only which may or may not be the same as the output result set.

   Z39.50 Close
       If a Close PDU is received, the server will respond with a Close PDU
       with reason=FINISHED, no matter which protocol version was negotiated
       during initialization. If the protocol version is 3 or more, the server
       will generate a Close PDU under certain circumstances, including a
       session timeout (60 minutes by default), and certain kinds of protocol
       errors. Once a Close PDU has been sent, the protocol association is
       considered broken, and the transport connection will be closed
       immediately upon receipt of further data, or following a short timeout.

   Z39.50 Explain
       Zebra maintains a "classic" Z39.50 Explain[1] database on the side.
       This database is called IR-Explain-1 and can be searched using the
       attribute set exp-1.

       The records in the explain database are of type grs.sgml. The root
       element for the Explain grs.sgml records is explain, thus explain.abs
       is used for indexing.

           Note
           Zebra must be able to locate explain.abs in order to index the
           Explain records properly. Zebra will work without it but the
           information will not be searchable.

THE SRU SERVER

       In addition to Z39.50, Zebra supports the more recent and web-friendly
       IR protocol SRU[2].  SRU can be carried over SOAP or a REST-like
       protocol that uses HTTP GET or POST to request search responses. The
       request itself is made of parameters such as query, startRecord,
       maximumRecords and recordSchema; the response is an XML document
       containing hit-count, result-set records, diagnostics, etc.  SRU can be
       thought of as a re-casting of Z39.50 semantics in web-friendly terms;
       or as a standardisation of the ad-hoc query parameters used by search
       engines such as Google and AltaVista; or as a superset of A9's
       OpenSearch (which it predates).

       Zebra supports Z39.50, SRU GET, SRU POST, SRU SOAP (SRW) - on the same
       port, recognising what protocol is used by each incoming requests and
       handling them accordingly. This is a achieved through the use of Deep
       Magic; civilians are warned not to stand too close.

   Running zebrasrv as an SRU Server
       Because Zebra supports all protocols on one port, it would seem to
       follow that the SRU server is run in the same way as the Z39.50 server,
       as described above. This is true, but only in an uninterestingly
       vacuous way: a Zebra server run in this manner will indeed recognise
       and accept SRU requests; but since it doesn't know how to handle the
       CQL queries that these protocols use, all it can do is send failure
       responses.

           Note
           It is possible to cheat, by having SRU search Zebra with a PQF
           query instead of CQL, using the x-pquery parameter instead of
           query. This is a non-standard extension of CQL, and a very naughty
           thing to do, but it does give you a way to see Zebra serving SRU
           ‘‘right out of the box''. If you start your favourite Zebra server
           in the usual way, on port 9999, then you can send your web browser
           to:

                   http://localhost:9999/Default?version=1.1
                    &operation=searchRetrieve
                    &x-pquery=mineral
                    &startRecord=1
                    &maximumRecords=1

           This will display the XML-formatted SRU response that includes the
           first record in the result-set found by the query mineral. (For
           clarity, the SRU URL is shown here broken across lines, but the
           lines should be joined together to make single-line URL for the
           browser to submit.)

       In order to turn on Zebra's support for CQL queries, it's necessary to
       have the YAZ generic front-end (which Zebra uses) translate them into
       the Z39.50 Type-1 query format that is used internally. And to do this,
       the generic front-end's own configuration file must be used. See the
       section called “YAZ SERVER VIRTUAL HOSTS”; the salient point for SRU
       support is that zebrasrv must be started with the -f frontendConfigFile
       option rather than the -c zebraConfigFile option, and that the
       front-end configuration file must include both a reference to the Zebra
       configuration file and the CQL-to-PQF translator configuration file.

       A minimal front-end configuration file that does this would read as
       follows:

                <yazgfs>
                  <server>
                    <config>zebra.cfg</config>
                    <cql2rpn>../../tab/pqf.properties</cql2rpn>
                  </server>
                </yazgfs>

       The <config> element contains the name of the Zebra configuration file
       that was previously specified by the -c command-line argument, and the
       <cql2rpn> element contains the name of the CQL properties file
       specifying how various CQL indexes, relations, etc. are translated into
       Type-1 queries.

       A zebra server running with such a configuration can then be queried
       using proper, conformant SRU URLs with CQL queries:

              http://localhost:9999/Default?version=1.1
               &operation=searchRetrieve
               &query=title=utah and description=epicent*
               &startRecord=1
               &maximumRecords=1

SRU PROTOCOL SUPPORT AND BEHAVIOR

       Zebra running as an SRU server supports SRU version 1.1, including CQL
       version 1.1. In particular, it provides support for the following
       elements of the protocol.

   SRU Search and Retrieval
       Zebra supports the SRU searchRetrieve[3] operation.

       One of the great strengths of SRU is that it mandates a standard query
       language, CQL, and that all conforming implementations can therefore be
       trusted to correctly interpret the same queries. It is with some shame,
       then, that we admit that Zebra also supports an additional query
       language, our own Prefix Query Format (PQF[4]). A PQF query is
       submitted by using the extension parameter x-pquery, in which case the
       query parameter must be omitted, which makes the request not valid SRU.
       Please feel free to use this facility within your own applications; but
       be aware that it is not only non-standard SRU but not even
       syntactically valid, since it omits the mandatory query parameter.

   SRU Scan
       Zebra supports SRU scan[5] operation. Scanning using CQL syntax is the
       default, where the standard scanClause parameter is used.

       In addition, a mutant form of SRU scan is supported, using the
       non-standard x-pScanClause parameter in place of the standard
       scanClause to scan on a PQF query clause.

   SRU Explain
       Zebra supports SRU explain[6].

       The ZeeRex record explaining a database may be requested either with a
       fully fledged SRU request (with operation=explain and version-number
       specified) or with a simple HTTP GET at the server's basename. The
       ZeeRex record returned in response is the one embedded in the YAZ
       Frontend Server configuration file that is described in the the section
       called “YAZ SERVER VIRTUAL HOSTS”.

       Unfortunately, the data found in the CQL-to-PQF text file must be added
       by hand-craft into the explain section of the YAZ Frontend Server
       configuration file to be able to provide a suitable explain record. Too
       bad, but this is all extreme new alpha stuff, and a lot of work has yet
       to be done ..

       There is no linkage whatsoever between the Z39.50 explain model and the
       SRU explain response (well, at least not implemented in Zebra, that is
       ..). Zebra does not provide a means using Z39.50 to obtain the ZeeRex
       record.

   Other SRU operations
       In the Z39.50 protocol, Initialization, Present, Sort and Close are
       separate operations. In SRU, however, these operations do not exist.

       ·    SRU has no explicit initialization handshake phase, but commences
           immediately with searching, scanning and explain operations.

       ·   Neither does SRU have a close operation, since the protocol is
           stateless and each request is self-contained. (It is true that
           multiple SRU request/response pairs may be implemented as multiple
           HTTP request/response pairs over a single persistent TCP/IP
           connection; but the closure of that connection is not a
           protocol-level operation.)

       ·   Retrieval in SRU is part of the searchRetrieve operation, in which
           a search is submitted and the response includes a subset of the
           records in the result set. There is no direct analogue of Z39.50's
           Present operation which requests records from an established result
           set. In SRU, this is achieved by sending a subsequent
           searchRetrieve request with the query cql.resultSetId=id where id
           is the identifier of the previously generated result-set.

       ·   Sorting in CQL is done within the searchRetrieve operation - in
           v1.1, by an explicit sort parameter, but the forthcoming v1.2 or
           v2.0 will most likely use an extension of the query language, CQL
           sorting[7].

       It can be seen, then, that while Zebra operating as an SRU server does
       not provide the same set of operations as when operating as a Z39.50
       server, it does provide equivalent functionality.

SRU EXAMPLES

       Surf into http://localhost:9999 to get an explain response, or use

                http://localhost:9999/?version=1.1&operation=explain

       See number of hits for a query

                http://localhost:9999/?version=1.1&operation=searchRetrieve
                &query=text=(plant%20and%20soil)

       Fetch record 5-7 in Dublin Core format

                http://localhost:9999/?version=1.1&operation=searchRetrieve
                                  &query=text=(plant%20and%20soil)
                                  &startRecord=5&maximumRecords=2&recordSchema=dc

       Even search using PQF queries using the extended naughty parameter
       x-pquery

                 http://localhost:9999/?version=1.1&operation=searchRetrieve
                                  &x-pquery=@attr%201=text%20@and%20plant%20soil

       Or scan indexes using the extended extremely naughty parameter
       x-pScanClause

                 http://localhost:9999/?version=1.1&operation=scan
                                  &x-pScanClause=@attr%201=text%20something

       Don't do this in production code!  But it's a great fast debugging aid.

YAZ SERVER VIRTUAL HOSTS

       The Virtual hosts mechanism allows a YAZ frontend server to support
       multiple backends. A backend is selected on the basis of the TCP/IP
       binding (port+listening address) and/or the virtual host.

       A backend can be configured to execute in a particular working
       directory. Or the YAZ frontend may perform CQL[8] to RPN conversion,
       thus allowing traditional Z39.50 backends to be offered as a SRU[2]
       service.  SRU Explain information for a particular backend may also be
       specified.

       For the HTTP protocol, the virtual host is specified in the Host
       header. For the Z39.50 protocol, the virtual host is specified as in
       the Initialize Request in the OtherInfo, OID
       1.2.840.10003.10.1000.81.1.

           Note
           Not all Z39.50 clients allows the VHOST information to be set. For
           those the selection of the backend must rely on the TCP/IP
           information alone (port and address).

       The YAZ frontend server uses XML to describe the backend
       configurations. Command-line option -f specifies filename of the XML
       configuration.

       The configuration uses the root element yazgfs. This element includes a
       list of listen elements, followed by one or more server elements.

       The listen describes listener (transport end point), such as TCP/IP,
       Unix file socket or SSL server. Content for a listener:

       CDATA (required)
           The CDATA for the listen element holds the listener string, such as
           tcp:@:210, tcp:server1:2100, etc.

       attribute id (optional)
           identifier for this listener. This may be referred to from server
           sections.

           Note
           We expect more information to be added for the listen section in a
           future version, such as CERT file for SSL servers.

       The server describes a server and the parameters for this server type.
       Content for a server:

       attribute id (optional)
           Identifier for this server. Currently not used for anything, but it
           might be for logging purposes.

       attribute listenref (optional)
           Specifies listener for this server. If this attribute is not given,
           the server is accessible from all listener. In order for the server
           to be used for real, however, the virtual host must match (if
           specified in the configuration).

       element config (optional)
           Specifies the server configuration. This is equivalent to the
           config specified using command line option -c.

       element directory (optional)
           Specifies a working directory for this backend server. If
           specified, the YAZ frontend changes current working directory to
           this directory whenever a backend of this type is started (backend
           handler bend_start), stopped (backend handler hand_stop) and
           initialized (bend_init).

       element host (optional)
           Specifies the virtual host for this server. If this is specified a
           client must specify this host string in order to use this backend.

       element cql2rpn (optional)
           Specifies a filename that includes CQL[8] to RPN conversion for
           this backend server. See CQL[8] section in YAZ manual. If given,
           the backend server will only "see" a Type-1/RPN query.

       element explain (optional)
           Specifies SRU[2] ZeeRex content for this server - copied verbatim
           to the client. As things are now, some of the Explain content seems
           redundant because host information, etc. is also stored elsewhere.

           The format of the Explain record is described in detail, with
           examples, on the file at the ZeeRex[9] web-site.

       The XML below configures a server that accepts connections from two
       ports, TCP/IP port 9900 and a local UNIX file socket. We name the
       TCP/IP server public and the other server internal.

            <yazgfs>
             <listen id="public">tcp:@:9900</listen>
             <listen id="internal">unix:/var/tmp/socket</listen>
             <server id="server1">
               <host>server1.mydomain</host>
               <directory>/var/www/s1</directory>
               <config>config.cfg</config>
             </server>
             <server id="server2">
               <host>server2.mydomain</host>
               <directory>/var/www/s2</directory>
               <config>config.cfg</config>
               <cql2rpn>../etc/pqf.properties</cql2rpn>
               <explain xmlns="http://explain.z3950.org/dtd/2.0/">
                 <serverInfo>
                   <host>server2.mydomain</host>
                   <port>9900</port>
                   <database>a</database>
                 </serverInfo>
               </explain>
             </server>
             <server id="server3" listenref="internal">
               <directory>/var/www/s3</directory>
               <config>config.cfg</config>
             </server>
            </yazgfs>

       There are three configured backend servers. The first two servers,
       "server1" and "server2", can be reached by both listener addresses -
       since no listenref attribute is specified. In order to distinguish
       between the two a virtual host has been specified for each of server in
       the host elements.

       For "server2" elements for CQL[8] to RPN conversion is supported and
       explain information has been added (a short one here to keep the
       example small).

       The third server, "server3" can only be reached via listener
       "internal".

NOTES

        1. Z39.50 Explain
           http://www.loc.gov/z3950/agency/markup/.html

        2. SRU
           http://www.loc.gov/standards/sru/

        3. SRU searchRetrieve
           http://www.loc.gov/standards/sru/specs/search-retrieve.html

        4. PQF
           http://www.indexdata.com/yaz/doc/tools.html#PQF

        5. SRU scan
           http://www.loc.gov/standards/sru/specs/scan.html/

        6. SRU explain
           http://www.loc.gov/standards/sru/specs/explain.html

        7. CQL sorting
           http://zing.z3950.org/cql/sorting.html

        8. CQL
           http://www.loc.gov/standards/sru/specs/cql.html

        9. ZeeRex
           http://explain.z3950.org/

NAME

SYNOPSIS

DESCRIPTION

OPTIONS

Z39.50 PROTOCOL SUPPORT AND BEHAVIOR

THE SRU SERVER

SRU PROTOCOL SUPPORT AND BEHAVIOR

SRU EXAMPLES

YAZ SERVER VIRTUAL HOSTS

SEE ALSO

NOTES