Man Linux: Main Page and Category List

NAME

       dbzinit,  dbzfresh,  dbzagain, dbzclose, dbzexists, dbzfetch, dbzstore,
       dbzsync, dbzsize, dbzgetoptions,  dbzsetoptions,  dbzdebug  -  database
       routines

SYNOPSIS

       #include <inn/dbz.h>

       bool dbzinit(const char *base)

       bool dbzclose(void)

       bool dbzfresh(const char *base, long size)

       bool dbzagain(const char *base, const char *oldbase)

       bool dbzexists(const HASH key)

       off_t dbzfetch(const HASH key)
       bool dbzfetch(const HASH key, void *ivalue)

       DBZSTORE_RESULT dbzstore(const HASH key, off_t offset)
       DBZSTORE_RESULT dbzstore(const HASH key, void *ivalue)

       bool dbzsync(void)

       long dbzsize(long nentries)

       void dbzgetoptions(dbzoptions *opt)

       void dbzsetoptions(const dbzoptions opt)

DESCRIPTION

       These functions provide an indexing system for rapid random access to a
       text file (the base file).

       Dbz stores offsets into the base text file for  rapid  retrieval.   All
       retrievals  are  keyed  on  a  hash  value  that  is  generated  by the
       HashMessageID() function.

       Dbzinit opens a database, an index into the base file base,  consisting
       of  files  base.dir  ,  base.index  ,  and base.hash which must already
       exist.  (If the database is new, they  should  be  zero-length  files.)
       Subsequent  accesses  go  to  that database until dbzclose is called to
       close the database.

       Dbzfetch searches the database for the  specified  key,  returning  the
       corresponding  value  if any, if <--enable-tagged-hash at configure> is
       specified.  If <--enable-tagged-hash at configure> is not specified, it
       returns  true  and content of ivalue is set.  Dbzstore stores the key -
       value pair in the database, if <--enable-tagged-hash at  configure>  is
       specified.  If <--enable-tagged-hash at configure> is not specified, it
       stores the content of ivalue.  Dbzstore will fail unless  the  database
       files  are  writable.   Dbzexists  will verify whether or not the given
       hash exists or not.  Dbz is optimized for this operation and it may  be
       significantly faster than dbzfetch().

       Dbzfresh  is a variant of dbzinit for creating a new database with more
       control over details.

       Dbzfresh’s size parameter specifies the size of the  first  hash  table
       within  the  database, in key-value pairs.  Performance will be best if
       the number of key-value pairs stored in the database  does  not  exceed
       about 2/3 of size.  (The dbzsize function, given the expected number of
       key-value  pairs,  will  suggest  a  database  size  that  meets  these
       criteria.)   Assuming  that an fseek offset is 4 bytes, the .index file
       will be 4 * size bytes.  The .hash file will be  DBZ_INTERNAL_HASH_SIZE
       * size bytes (the .dir file is tiny and roughly constant in size) until
       the number of key-value pairs exceeds  about  80%  of  size.   (Nothing
       awful  will  happen  if  the  database  grows  beyond 100% of size, but
       accesses will slow down quite a bit and the .index and .hash files will
       grow somewhat.)

       Dbz  stores up to DBZ_INTERNAL_HASH_SIZE bytes of the message-id’s hash
       in the .hash file to confirm a hit.  This eliminates the need  to  read
       the  base file to handle collisions.  This replaces the tagmask feature
       in previous dbz releases.

       A size of ‘‘0’’ given to dbzfresh is synonymous with the local default;
       the normal default is suitable for tables of 5,000,000 key-value pairs.
       Calling dbzinit(name) with the empty  name  is  equivalent  to  calling
       dbzfresh(name, 0).

       When databases are regenerated periodically, as in news, it is simplest
       to pick the parameters for a new database based on the old  one.   This
       also  permits  some memory of past sizes of the old database, so that a
       new database  size  can  be  chosen  to  cover  expected  fluctuations.
       Dbzagain  is  a variant of dbzinit for creating a new database as a new
       generation of an old database.  The database  files  for  oldbase  must
       exist.  Dbzagain is equivalent to calling dbzfresh with a size equal to
       the result of applying dbzsize to the largest number of entries in  the
       oldbase database and its previous 10 generations.

       When many accesses are being done by the same program, dbz is massively
       faster if its first hash table is in  memory.   If  the  ‘‘pag_incore’’
       flag is set to INCORE_MEM, an attempt is made to read the table in when
       the database is opened, and dbzclose writes it out to disk again (if it
       was  read  successfully  and  has been modified).  Dbzsetoptions can be
       used to set the pag_incore and exists_incore flag to  new  value  which
       should  be  ‘‘INCORE_NO’’,  ‘‘INCORE_MEM’’,  or ‘‘INCORE_MMAP’’ for the
       .hash and .index files separately; this does not affect the status of a
       database  that  has  already been opened.  The default is ‘‘INCORE_NO’’
       for the .index file  and  ‘‘INCORE_MMAP’’  for  the  .hash  file.   The
       attempt  to  read the table in may fail due to memory shortage; in this
       case dbz fails with an error.  Stores to an in-memory database are  not
       (in  general)  written out to the file until dbzclose or dbzsync, so if
       robustness in  the  presence  of  crashes  or  concurrent  accesses  is
       crucial,   in-memory  databases  should  probably  be  avoided  or  the
       writethrough option should be set to ‘‘true’’;

       If the nonblock option is ‘‘true’’, then writes to the .hash and .index
       files  will  be done using non-blocking I/O.  This can be significantly
       faster if your platform supports non-blocking I/O with files.

       Dbzsync causes all buffers etc. to be flushed out to the files.  It  is
       typically  used  as a precaution against crashes or concurrent accesses
       when a dbz-using process will be running for a  long  time.   It  is  a
       somewhat expensive operation, especially for an in-memory database.

       Concurrent  reading  of  databases  is  fairly  safe,  but  there is no
       (inter)locking, so concurrent updating is not.

       An open database occupies three stdio streams and two file descriptors;
       Memory  consumption is negligible (except for stdio buffers) except for
       in-memory databases.

SEE ALSO

       dbm(3), history(5), libinn(3)

DIAGNOSTICS

       Functions returning bool values return ‘‘true’’ for success,  ‘‘false’’
       for  failure.   Functions returning off_t values return a value with -1
       for failure.  Dbzinit attempts to have errno set plausibly  on  return,
       but  otherwise  this  is not guaranteed.  An errno of EDOM from dbzinit
       indicates that the database did not appear to be in dbz format.

       If DBZTEST is defined at compile-time then a main()  function  will  be
       included.  This will do performance tests and integrity test.

HISTORY

       The   original   dbz   was  written  by  Jon  Zeeff  (zeeff@b-tech.ann-
       arbor.mi.us).  Later contributions by David  Butler  and  Mark  Moraes.
       Extensive  reworking,  including  this  documentation, by Henry Spencer
       (henry@zoo.toronto.edu) as part  of  the  C  News  project.   MD5  code
       borrowed   from   RSA.    Extensive   reworking   to  remove  backwards
       compatibility and to add hashes  into  dbz  files  by  Clayton  O’Neill
       (coneill@oneill.net)

BUGS

       Unlike  dbm,  dbz  will  refuse  to  dbzstore with a key already in the
       database.  The user is responsible for avoiding this.

       The RFC5322 case mapper implements only a first  approximation  to  the
       hideously-complex RFC5322 case rules.

       Dbz no longer tries to be call-compatible with dbm in any way.

                                  6 Sep 1997                            DBZ(3)