NAME
estwaver - command line interface of web crawler
SYNOPSIS
estwaver init rootdir
estwaver crawl [-restart|-revisit|-revcont] rootdir
estwaver unittest rootdir
estwaver fetch [-proxy hostr port] [-tout num] [-il lang] url
DESCRIPTION
estwaver is an aggregation of sub commands. The name of a sub command
is specified by the first argument. Other arguments are parsed
according to each sub command. The argument rootdir specifies the
crawler root directory which contains configuration file and so on.
estwaver init rootdir
Create the crawler root directory.
estwaver crawl [-restart|-revisit|-revcont] rootdir
Start crawling.
If -restart is specified, crawling is restarted from the seed
documents.
If -revisit is specified, collected documents are revisited.
If -revcont is specified, collected documents are revisited and
then crawling is continued.</dd>
estwaver unittest rootdir
Perform unit tests.
estwaver fetch [-proxy hostr port] [-tout num] [-il lang] url
Fetch a document.
url specifies the URL of a document.
-proxy specifies the host name and the port number of the proxy
server.
-tout specifies timeout in seconds.
-il specifies the preferred language. By default, it is
English.
All sub commands return 0 if the operation is success, else return 1.
A running crawler finishes with closing the database when it catches
the signal 1 (SIGHUP), 2 (SIGINT), 3 (SIGQUIT), or 15 (SIGTERM).
When crawling finishes, there is a directory _index in the crawler root
directory. It is an index available by estcmd and so on.
SEE ALSO
estconfig(1), estcmd(1), estmaster(1), estcall(1), estraier(3),
estnode(3)