estcmd - command line interface of the core API

NAME

       estcmd - command line interface of the core API

SYNOPSIS

       estcmd  create  [-tr] [-apn|-acc] [-xs|-xl|-xh|-xh2|-xh3] [-sv|-si|-sa]
       [-attr name type] db

       estcmd  put  [-tr]  [-cl]  [-ws]  [-apn|-acc]  [-xs|-xl|-xh||-xh2|-xh3]
       [-sv|-si|-sa] db [file]

       estcmd out [-cl] [-pc enc] db expr

       estcmd edit [-pc enc] db expr name [value]

       estcmd get [-nl|-nb] [-pidx path] [-pc enc] db expr [attr]

       estcmd list [-nl|-nb] [-lp] db

       estcmd uriid [-nl|-nb] [-pidx path] [-pc enc] db expr

       estcmd meta db [name [value]]

       estcmd inform [-nl|-nb] db

       estcmd optimize [-onp] [-ond] db

       estcmd merge [-cl] db target

       estcmd repair [-rst|-rsh] db

       estcmd      search     [-nl|-nb]     [-pidx     path]     [-ic     enc]
       [-vu|-va|-vf|-vs|-vh|-vx|-dd] [-sn wnum hnum anum] [-kn num] [-um] [-ec
       rn]  [-gs|-gf|-ga]  [-cd] [-ni] [-sf|-sfr|-sfu|-sfi] [-hs] [-attr expr]
       [-ord expr] [-max num] [-sk num] [-aux num] [-dis name]  [-sim  id]  db
       [phrase]

       estcmd  gather [-tr] [-cl] [-ws] [-no] [-fe|-ft|-fh|-fm] [-fx sufs cmd]
       [-fz] [-fo] [-rm sufs] [-ic enc] [-il lang] [-bc] [-lt num]  [-lf  num]
       [-pc     enc]    [-px    name]    [-aa    name    value]    [-apn|-acc]
       [-xs|-xl|-xh|-xh2|-xh3] [-sv|-si|-sa] [-ss name] [-sd] [-cm] [-cs  num]
       [-ncm] [-kn num] [-um] db [file|dir]

       estcmd purge [-cl] [-no] [-fc] [-pc enc] [-attr expr] db [prefix]

       estcmd  extkeys  [-no]  [-fc] [-dfdb file] [-ncm] [-ni] [-kn num] [-um]
       [-attr expr] db [prefix]

       estcmd words [-nl|-nb] [-dfdb file] [-kw|-kt] db

       estcmd draft [-ft|-fh|-fm] [-ic enc] [-il lang] [-bc]  [-lt  num]  [-kn
       num] [-um] [file]

       estcmd break [-ic enc] [-il lang] [-apn|-acc] [-wt] [file]

       estcmd iconv [-ic enc] [-il lang] [-oc enc] [file]

       estcmd regex [-inv] [-repl str] expr [file]

       estcmd scandir [-tf|-td] [-pa|-pu] [dir]

       estcmd  multi  [-db  db]  [-nl|-nb] [-ic enc] [-gs|-gf|-ga] [-cd] [-ni]
       [-sf|-sfr|-sfu|-sfi] [-hs] [-hu] [-attr expr] [-ord  expr]  [-max  num]
       [-sk num] [-aux num] [-dis name] [phrase]

       estcmd randput [-ren|-rla|-reu|-ror|-rjp|-rch] [-cs num] db dnum

       estcmd wicked db dnum

       estcmd regression db

       estcmd version

DESCRIPTION

       estcmd is an aggregation of sub commands.  The name of a sub command is
       specified by the first argument.  Other arguments are parsed  according
       to each sub command.  The argument db specifies the path of an index.

       estcmd  create  [-tr] [-apn|-acc] [-xs|-xl|-xh|-xh2|-xh3] [-sv|-si|-sa]
       [-attr name type] db
              Create an index.
              If  -tr  is  specified, a new index is created regardless if one
              exists.
              If -apn is  specified,  N-gram  analysis  is  performed  against
              European text also.
              If  -acc  is specified, character category analysis is performed
              instead of N-gram analysis.
              If -xs is specified, the index is tuned to  register  less  than
              50000 documents.
              If  -xl  is  specified, the index is tuned to register more than
              300000 documents.
              If -xh is specified, the index is tuned to  register  more  than
              1000000 documents.
              If  -xh2  is specified, the index is tuned to register more than
              5000000 documents.
              If -xh3 is specified, the index is tuned to register  more  than
              10000000 documents.
              If -sv is specified, scores are stored as void.
              If -si is specified, scores are stored as 32-bit integer.
              If  -sa  is specified, scores are stored as-is and marked not to
              be tuned when search.
              -attr specifies an attribute index  and  its  data  type.   This
              option can be specified multiple times.

       estcmd    put    [-tr]    [-cl]   [-apn|-acc]   [-xs|-xl|-xh|-xh2|-xh3]
       [-sv|-si|-sa] db [file]
              Register a document of document draft to an index.
              file  specifies  a  target file.  If it is omitted, the standard
              input is read.
              If -tr is specified, a new index is created  regardless  if  one
              exists.
              If  -cl  is  specified,  regions  of  a overwritten document are
              cleaned up.
              If -ws is specified, scores are weighted statically  with  score
              weighting attribute.
              If  -apn  is  specified,  N-gram  analysis  is performed against
              European text also.
              If -acc is specified, character category analysis  is  performed
              instead of N-gram analysis.
              If  -xs  is  specified, the index is tuned to register less than
              50000 documents.
              If -xl is specified, the index is tuned to  register  more  than
              300000 documents.
              If  -xh  is  specified, the index is tuned to register more than
              1000000 documents.
              If -xh2 is specified, the index is tuned to register  more  than
              5000000 documents.
              If  -xh3  is specified, the index is tuned to register more than
              10000000 documents.
              If -sv is specified, scores are stored as void.
              If -si is specified, scores are stored as 32-bit integer.
              If -sa is specified, scores are stored as-is and marked  not  to
              be tuned when search.

       estcmd out [-pc enc] [-cl] db expr
              Remove information of a document from an index.
              expr  specifies  the  ID number, the URI, or the local path of a
              document.
              If -cl is specified, regions of the document are cleaned up.
              -pc specifies the encoding of file paths.   By  default,  it  is
              ISO-8859-1.

       estcmd edit [-pc enc] db expr name [value]
              Edit an attribute of a document in an index.
              expr  specifies  the  ID number, the URI, or the local path of a
              document.
              name specifies the name of an attribute.
              value specifies the value of the attribute.  If it  is  omitted,
              the attribute is removed.
              -pc  specifies  the  encoding of the file path and the attribute
              value.  By default, it is ISO-8859-1.

       estcmd get [-nl|-nb] [-pidx path] [-pc enc] db expr [attr]
              Output document draft of a document in an index.
              expr specifies the ID number, the URI, or the local  path  of  a
              document.
              If attr is specified, only the value of the attribute is output.
              If -nl is specified, the index is opened without file locking.
              If -nb is specified, file locking is performed without blocking.
              -pidx  specifies the path of a pseudo index.  This option can be
              specified multiple times.
              -pc specifies the encoding of file paths.   By  default,  it  is
              ISO-8859-1.

       estcmd list [-nl|-nb] [-lp] db
              Output a list of all document in an index.
              If -nl is specified, the index is opened without file locking.
              If -nb is specified, file locking is performed without blocking.
              If -lp is specified, local path equivalent to URL  of  "file://"
              is output.

       estcmd uriid [-nl|-nb] [-pidx path] [-pc enc] db expr
              Output the ID number of a document specified by URI.
              expr specifies the URI or the local path of a document.
              If -nl is specified, the index is opened without file locking.
              If -nb is specified, file locking is performed without blocking.
              -pidx specifies the path of a pseudo index.  This option can  be
              specified multiple times.
              -pc  specifies  the  encoding  of file paths.  By default, it is
              ISO-8859-1.

       estcmd meta db [name [value]]
              Handle meta data.
              name specifies the name of a piece  of  meta  data.   If  it  is
              omitted, a list of all names is output.
              value  specifies  the value of the meta data to be recorded.  If
              it is omitted, the current value is output.  If it is  an  empty
              string, the meta data is removed.

       estcmd inform [-nl|-nb] db
              Output the number of documents and the number of unique words in
              an index.
              If -nl is specified, the index is opened without file locking.
              If -nb is specified, file locking is performed without blocking.

       estcmd optimize [-onp] [-ond] db
              Optimize an index and clean up dispensable regions.
              If  -onp  is  specified,  it  is omitted to clean up dispensable
              regions.
              If -ond is specified, it is omitted  to  optimize  the  database
              files.

       estcmd merge [-cl] db target
              Merge another index.
              target specifies the path of another index.
              If  -cl  is  specified,  regions  of  overwritten  documents are
              cleaned up.

       estcmd repair [-rst|-rsh] db
              Repair a broken index.
              If -rst is specified, strict consistency check is performed.
              If -rsh is specified, consistency check is omitted.

       estcmd     search     [-nl|-nb]     [-pidx     path]     [-ic      enc]
       [-vu|-va|-vf|-vs|-vh|-vx|-dd] [-sn wnum hnum anum] [-kn num] [-um] [-ec
       rn] [-gs|-gf|-ga] [-cd] [-ni] [-sf|-sfr|-sfu|-sfi] [-hs]  [-attr  expr]
       [-ord  expr]  [-max  num] [-sk num] [-aux num] [-dis name] [-sim id] db
       [phrase]
              Search an index for documents.
              phrase specifies the search phrase.
              If -nl is specified, the index is opened without file locking.
              If -nb is specified, file locking is performed without blocking.
              -pidx  specifies the path of a pseudo index.  This option can be
              specified multiple times.
              -ic specifies the input encoding.  By default, it is UTF-8.
              If -vu is specified, TSV of ID number and URI are output.
              If -va is specified, multipart format  including  attributes  is
              output.
              If  -vf  is specified, multipart format including document draft
              is output.
              If -vs is specified, multipart format including  attributes  and
              snippets is output.
              If  -vh is specified, human readable format including attributes
              and snippets is output.
              If -vx is specified,  XML  including  including  attributes  and
              snippets is output.
              If  -dd  is  specified, document draft data are dumped and saved
              into separated files.
              -sn specifies the number of whole width of snippet and width  of
              strings  picked  up  from the beginning of the text and width of
              strings picked up around each highlighted word.
              -kn specifies the  number  of  keywords  to  be  extracted.   By
              default, keyword extraction is not performed.
              If  -um  is  specified,  morphological  analyzers  are  used for
              keyword extraction.
              -ec specifies lower limit of similarity eclipse.
              If -gs is  specified,  every  key  of  N-gram  is  checked.   By
              default, it is alternately.
              If -gf is specified, keys of N-gram are checked every three.
              If -ga is specified, keys of N-gram are checked every four.
              If  -cd  is specified, whether documents match the search phrase
              definitely is checked.
              If -ni is specified, TF-IDF tuning is omitted.
              If -sf is specified, the phrase is treated as a simplified form.
              If -sfr is specified, the phrase is treated as a rough form.
              If -sfu is specified, the phrase is treated as a union form.
              If  -sfi  is specified, the phrase is treated as an intersection
              form.
              If  -hs  is  specified,  score  information  is  output  as   an
              attribute.
              -attr  specifies an attribute search condition.  This option can
              be specified multiple times.
              -ord  specifies  the  order  expression.   By  default,  it   is
              descending by score.
              -max  specifies the maximum number of shown documents.  Negative
              means unlimited.  By default, it is 10.
              -sk specifies  the  number  of  documents  to  be  skipped.   By
              default, it is 0.
              -aux  specifies  permission  to  adopt  result  of the auxiliary
              index.  If it is not more than 0, the  auxiliary  index  is  not
              used.  By default, it is 32.
              -dis specifies the name of the distinct attribute.
              -sim specifies the ID number of the seed document for similarity
              search.

       estcmd gather [-tr] [-cl] [-ws] [-no] [-fe|-ft|-fh|-fm] [-fx sufs  cmd]
       [-fz]  [-fo]  [-rm sufs] [-ic enc] [-il lang] [-bc] [-lt num] [-lf num]
       [-pc    enc]    [-px    name]    [-aa    name    value]     [-apn|-acc]
       [-xs|-xl|-xh|-xh2|-xh3]  [-sv|-si|-sa] [-ss name] [-sd] [-cm] [-cs num]
       [-ncm] [-kn num] [-um] db [file|dir]
              Scan the local file system and register documents into an index.
              If  the third argument is the name of a file, a list of paths of
              target documents are read from it.  If it is "-",  the  standard
              input is specified.
              If  the  third  argument  is the name of a directory.  All files
              under the directory are treated as target documents.
              If -tr is specified, a new index is created  regardless  if  one
              exists.
              If  -cl  is  specified,  regions  of  overwritten  documents are
              cleaned up.
              If -ws is specified, scores are weighted statically  with  score
              weighting attribute.
              If  -no  is  specified,  operations are printed but not executed
              actually.
              If -fe is specified, target files are treated as document draft.
              By  default,  the  format  is  detected  by  the  suffix of each
              document.
              If -ft is specified, target files are treated as plain text.
              If -fh is specified, target files are treated as HTML.
              If -fm is specified, target files are treated as MIME.
              If -fx is specified, target files with  the  specified  suffixes
              are  processed  by the specified outer command.  "*" matches any
              file.  If the command is leaded  by  "T@",  the  output  of  the
              command  is  treated as plain text.  If the command is leaded by
              "H@", the output of the command is  treated  as  HTML.   If  the
              command  is leaded by "M@", the output of the command is treated
              as MIME.  Else, the output is treated as document  draft.   This
              option can be specified multiple times.
              If -fz is specified, documents which do not corresponding to the
              condition of -fx are ignored.
              If -fo is specified, target files are not read.   It  is  useful
              for efficient process of the outer command.
              If  -rm  is  specified, target files with the specified suffixes
              are  removed.   "*"  matches  any  file.   This  option  can  be
              specified multiple times.
              -ic  specifies  the  input encoding.  By default, it is detected
              automatically.
              -il specifies the preferred input language.  By default, English
              is preferred.
              If -bc is specified, binary files are detected and ignored.
              -lt  specifies  the  text  size  limitation  by  kilo bytes.  By
              default, it is 128KB.  If it is negative, the size is unlimited.
              -lf  specifies  the  file  size  limitation  by  mega bytes.  By
              default, it is 32MB.  If it is negative, the size is  unlimited.
              -pc  specifies  the  encoding  of file paths.  By default, it is
              ISO-8859-1.
              -px specifies the name of an attribute read  from  the  list  of
              paths.   As  the  list  of paths can be in TSV format, the first
              field is treated as the path of a target  document,  the  second
              field  and  the  followers  are definitions of attribute values.
              -px specifies the name of each values of the  second  field  and
              the followers.  This option can be specified multiple times.
              -aa specifies the name and the value of an additional attribute.
              This option can be specified multiple times.
              If -apn is  specified,  N-gram  analysis  is  performed  against
              European text also.
              If  -acc  is specified, character category analysis is performed
              instead of N-gram analysis.
              If -xs is specified, the index is tuned to  register  less  than
              50000 documents.
              If  -xl  is  specified, the index is tuned to register more than
              300000 documents.
              If -xh is specified, the index is tuned to  register  more  than
              1000000 documents.
              If  -xh2  is specified, the index is tuned to register more than
              5000000 documents.
              If -xh3 is specified, the index is tuned to register  more  than
              10000000 documents.
              If -sv is specified, scores are stored as void.
              If -si is specified, scores are stored as 32-bit integer.
              If  -sa  is specified, scores are stored as-is and marked not to
              be tuned when search.
              -ss specifies the name of an attribute for substitute score.
              If -sd is specified, the  modification  date  of  each  file  is
              recorded as an attribute.
              If  -cm  is specified, documents whose modification date has not
              changed are ignored.
              -cs specifies the size  of  cache  memory  by  mega  bytes.   By
              default, it is 64MB.
              If  -ncm  is  specified,  checking  availability  of the virtual
              memory is omitted.
              -kn specifies the  number  of  keywords  to  be  extracted.   By
              default, keyword extraction is not performed.
              If  -um  is  specified,  morphological  analyzers  are  used for
              keyword extraction.

       estcmd purge [-cl] [-no] [-fc] [-pc enc] [-attr expr] db [prefix]
              Purge information of documents which do not exist  on  the  file
              system.
              If  prefix  is  specified,  only documents whose URIs are begins
              with it.  It can be specified by the local path of a  directory.
              If  -cl  is  specified,  regions  of  the  deleted documents are
              cleaned up.
              If -no is specified, operations are  printed  but  not  executed
              actually.
              If  -fc  is  specified,  information of all target documents are
              deleted.
              -pc specifies the encoding of file paths.   By  default,  it  is
              ISO-8859-1.
              -attr  specifies an attribute search condition.  This option can
              be specified multiple times.

       estcmd extkeys [-no] [-fc] [-dfdb file] [-ncm] [-ni]  [-kn  num]  [-um]
       [-attr expr] db [prefix]
              Create a database of keywords extracted from documents.
              If prefix is specified, only documents  whose  URIs  are  begins
              with it.
              If  -no  is  specified,  operations are printed but not executed
              actually.
              If  -fc  is  specified,  all  target  documents  are   processed
              whichever they have existing records or not.
              -dfdb  specifies  an  outher database of document frequency.  By
              default, document frequency is calculated dynamically  according
              to the index.
              If  -ncm  is  specified,  checking  availability  of the virtual
              memory is omitted.
              If -ni is specified, TF-IDF tuning is omitted.
              -kn specifies the  number  of  keywords  to  be  extracted.   By
              default, it is 32.
              If  -um  is  specified,  morphological  analyzers  are  used for
              keyword extraction.
              -attr specifies an attribute search condition.  This option  can
              be specified multiple times.

       estcmd words [-nl|-nb] [-dfdb file] [-kw|-kt] db
              Output  a list of all unique words and each record size which is
              treated as docuemnt frequency.
              If -nl is specified, the index is opened without file locking.
              If -nb is specified, file locking is performed without blocking.
              -dfdb  specifies  an  outer database where the result is stored.
              By default, the result is output to the standard output as  TSV.
              If  the  outer database already exists, the value of each record
              is incremented.
              If -kw is  specified,  keywords  and  numbers  of  corresponding
              documents are output.
              If  -kt  is  specified,  keywords  and  their  related terms are
              output.

       estcmd draft [-ft|-fh|-fm] [-ic enc] [-il lang] [-bc]  [-lt  num]  [-kn
       num] [-um] [file]
              For test and debug.

       estcmd break [-ic enc] [-il lang] [-apn|-acc] [-wt] [file]
              For test and debug.

       estcmd iconv [-ic enc] [-il lang] [-oc enc] [file]
              For test and debug.

       estcmd regex [-inv] [-repl str] expr [file]
              For test and debug.

       estcmd scandir [-tf|-td] [-pa|-pu] [dir]
              For test and debug.

       estcmd multi [-db db] [-nl|-nb] [-ic  enc]  [-gs|-gf|-ga]  [-cd]  [-ni]
       [-sf|-sfr|-sfu|-sfi]  [-hs]  [-hu]  [-attr expr] [-ord expr] [-max num]
       [-sk num] [-aux num] [-dis name] [phrase]
              For test and debug.

       estcmd randput [-ren|-rla|-reu|-ror|-rjp|-rch] [-cs num] db dnum
              For test and debug.

       estcmd wicked db dnum
              For test and debug.

       estcmd regression db
              For test and debug.

       estcmd version
              Show the version information.

       All sub commands return 0 if the operation is success, else  return  1.
       As  for  put, out, gather, purge, randput, wicked, and regression, they
       finish with closing the database when they catch the signal 1 (SIGHUP),
       2 (SIGINT), 3 (SIGQUIT), 13 (SIGPIPE), or 15 (SIGTERM).

       The  data type of attribute indexes specified by -attr option of create
       sub command should be "seq" for sequencial type, "str" for string type,
       or "num" for number type.

       Each  pseudo  index specified by -pidx option of search sub command and
       so on is a directory containing files of document draft.  If you search
       a  main  index  with  pseudo indexes, meta search of the main index and
       pseudo indexes is performed.

       The  encoding  name  specified  by  -ic  option  should  be  such  name
       registered  to IETF as UTF-8, ISO-8859-1, and so on.  The language name
       specified  by  -il  option  should  be  one  of  "en"  (English),  "ja"
       (Japanese, "zh" (Chinese), "ko" (Korean).

       The  outer  command specified by -fx option of gather receives the path
       of the target document by the first argument and the path for output by
       the second argument.  The original path of the target document is given
       as the value of the environment variable ‘ESTORIGFILE’.

       Note that similarity search is very slow, by default.  To  improve  the
       performance  of  similarity search, running "estcmd extkeys" beforehand
       is strongly recommended.

NAME

SYNOPSIS

DESCRIPTION

SEE ALSO