NAME
ra-retrieve - retrieve files that match a query for use with
remembrance agent software
SYNOPSIS
ra-retrieve [--version] [-v] [-d] <base-dir> [--docnum <docnum>]
DESCRIPTION
ra-index and ra-retrieve make up the Savant search engine, an
information retrieval engine designed as a back-end for the Remembrance
Agent (RA). Given a collection of the user’s accumulated email, usenet
news articles, papers, saved HTML files and other text notes, the RA
attempts to find those documents which are most relevant to the user’s
current context. That is, it searches this collection of text for the
documents which bear the highest word-for-word similarity to the text
the user is currently editing, in the hope that they will also bear
high conceptual similarity and thus be useful to the user’s current
work. With the Emacs front-end, these suggestions are continuously
displayed in a small buffer at the bottom of the user’s window. If a
suggestion looks useful, the full text can be retrieved with a single
command.
The Remembrance Agent works in two stages. First, the user’s
collection of text documents is indexed into a database saved in a
vector format. After the database is created, the other stage of the
Remembrance Agent is run from emacs, where it periodically takes a
sample of text from the working buffer and finds those documents from
the collection that are most similar. It summarizes the top documents
in a small emacs window and allows you to retrieve the entire text of
any one with a keystroke. See the README file for information on using
the Emacs front-end.
The RA is primarily designed as a proactive information provider that
continually gives you information that might be relevant to your
current environment. In this mode, ra-retrieve is run by a front end
and its output is parsed into a more human-readable format. However,
Savant can also be used as a standard text and information retrieval
search engine.
USAGE
The one argument to ra-retrieve is <base-dir>, which is the directory
containing the index files created by ra-index. This starts an
interactive process that handles queries and returns documents that
most match that query. When running with the -v argument, a menu and
other information is printed. Without the -v option it is assumed that
ra-retrieve has been run from a front-end, and only minimal information
is printed. The following commands are available:
query <num-lines>
Find the <num-lines> most relevant documents to a query.
Default is 5. Enter the text of the query, followed by a ^D
(ASCII 04). If the query matches a predefined template then
fields are parsed and separately. For example, in emacs mail
mode the from, subject, date and body of the message are all
individually parsed. If no template matches, the query is
assumed to be plain text.
A query will output up to n summary lines followed by a period
on a line by itself. Each summary line will contain a line
number, relevance number, document number, and a series of
fields describing the document. The final field is a comma-
separated list of words from the query that most contributed to
this document being chosen.
retrieve <document number>
Retrieve and print the document with the given document number.
Document number is the third field outputed by the query. The
full text of the document is displayed. If the document is a
part of a larger file, such as in an email in an archive file,
only that one document is shown.
loc-retrieve <document number>
Retrieve and print the location of the document with the given
document number. Three values are displayed, each on their own
separate line. The first is the character offset to where the
beginning of the document is found. The second is the character
offset for the end of the document. The third line contains the
fully expanded filename for the document itself. This is
primarily so front-ends can load the document and display them
with their own formatting.
info Display version and database info.
quit Quit.
? Display menu.
OPTIONS
-v Verbose mode. Print a menu and other info for running without a
front-end.
-d Debug mode. Print not-so-useful information.
[--docnum <docnum>]
Print the contents of the specified document number and exit.
This option doesn’t use as much memory as interactive mode, and
is useful for scripts that call this program.
SEE ALSO
ra-index(1)
AUTHOR
Bradley Rhodes, MIT Media Lab. Please send comments and questions to
ra-bugs@media.mit.edu. New versions and updates can be found at
http://www.media.mit.edu/~rhodes/RA/
COPYRIGHT
All code included in versions up to and including 2.09:
Copyright (C) 2001 Massachusetts Institute of Technology.
All modifications subsequent to version 2.09 are copyright Bradley
Rhodes or their respective authors.
Developed by Bradley Rhodes at the Media Laboratory, MIT, Cambridge,
Massachusetts, with support from British Telecom and Merrill Lynch.
This program is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
Free Software Foundation; either version 2 of the License, or (at your
option) any later version. For commercial licensing under other terms,
please consult the MIT Technology Licensing Office.
This program may be subject to the following US and/or foreign patents
(pending): "Method and Apparatus for Automated, Context-Dependent
Retrieval of Information," MIT Case No. 7870TS. If any of these patents
are granted, royalty-free license to use this and derivative programs
under the GNU General Public License are hereby granted.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
BUGS
Dates are not currently indexed, so anything trying to do a date query
gets no suggestion back.
Requires GNU make to compile.
The template structure isn’t documented.