NAME
libextractor - meta-information extraction library 0.5.11
SYNOPSIS
#include <extractor.h>
typedef struct EXTRACTOR_Keywords {
char * keyword;
EXTRACTOR_KeywordType keywordType;
struct EXTRACTOR_Keywords * next;
} EXTRACTOR_KeywordList;
EXTRACTOR_ExtractorList * EXTRACTOR_loadDefaultLibraries ();
const char * EXTRACTOR_getKeywordTypeAsString (const
EXTRACTOR_KeywordType type);
EXTRACTOR_ExtractorList * EXTRACTOR_loadConfigLibraries
(EXTRACTOR_ExtractorList * prev, const char * config);
EXTRACTOR_ExtractorList * EXTRACTOR_addLibrary
(EXTRACTOR_ExtractorList * prev, const char * library);
EXTRACTOR_ExtractorList * EXTRACTOR_addLibraryLast
(EXTRACTOR_ExtractorList * prev, const char * library);
EXTRACTOR_ExtractorList * EXTRACTOR_removeLibrary
(EXTRACTOR_ExtractorList * prev, const char * library);
void EXTRACTOR_removeAll (EXTRACTOR_ExtractorList * prev);
EXTRACTOR_KeywordList * EXTRACTOR_getKeywords (EXTRACTOR_ExtractorList
* extractor, const char * filename);
EXTRACTOR_KeywordList * EXTRACTOR_getKeywords (EXTRACTOR_ExtractorList
* extractor, const char * data, size_t size);
EXTRACTOR_KeywordList * EXTRACTOR_removeEmptyKeywords
(EXTRACTOR_KeywordList * list);
EXTRACTOR_KeywordList * EXTRACTOR_removeDuplicateKeywords
(EXTRACTOR_KeywordList * list, const unsigned int options);
void EXTRACTOR_printKeywords (FILE * handle, EXTRACTOR_KeywordList *
keywords);
void EXTRACTOR_freeKeywords (EXTRACTOR_KeywordList * keywords);
const char * EXTRACTOR_extractLast (const EXTRACTOR_KeywordType *
type, EXTRACTOR_KeywordList * keywords);
const char * EXTRACTOR_extractLastByString (const char * type,
EXTRACTOR_KeywordList * keywords);
unsigned int EXTRACTOR_countKeywords (EXTRACTOR_KeywordList *
keywords);
EXTRACTOR_DEFAULT_LIBRARIES
EXTRACTOR_VERSION
DESCRIPTION
libextractor is a simple library for keyword extraction. libExtractor
does not support all formats but supports a simple plugging mechanism
such that you can quickly add extractors for additional formats, even
without recompiling libExtractor. libExtractor typically ships with
one or more helper-libraries that can be used to obtain keywords from
common file-types. If you want to write your own extractor for some
filetype, all you need to do is write a little library that implements
a single method with this signature:
EXTRACTOR_KeywordList * LIBRARYNAME_extract(const char * filename,
char * data,
size_t size,
EXTRACTOR_KeywordList *
prev);
The filename is the name of the file, data is a pointer to the contents
of the file and size is the size of the file. The extract method must
prepend keywords that it finds to the linked list ’prev’ and return the
new head. The library must allocate (malloc) the entry in the keyword
list and the memory for the filename since both will be free’ed by
libExtractor once the application calls freeKeywords. An example
implementation can be found in mp3extractor.c. The application extract
gives an example how to use libExtractor.
The basic use of libextractor is to load the plugins (for example with
EXTRACTOR_loadDefaultLibraries), then to extract the keyword list using
EXTRACTOR_getKeywords, processing the list (using application specific
code and possibly some of the postprocessing convenience functions like
EXTRACTOR_removeDuplicateKeywords), freeing the keyword list (using
EXTRACTOR_freeKeywords) and finally unloading the plugins (with
EXTRACTOR_removeAll).
The keywords obtained from libextractor are supposed to be UTF-8
encoded. The EXTRACTOR_printKeywords function converts the UTF-8
keywords to the character set from the current locale before printing
them. Plugins are supposed to convert meta-data to UTF-8 if necessary.
SEE ALSO
extract(1)
LEGAL NOTICE
libextractor is released under the GPL and a GNU project
(http://www.gnu.org/).
BUGS
A couple of file-formats (on the order of 10^3) are not recognized...
AUTHORS
extract was originally written by Christian Grothoff
<christian@grothoff.org> and Vidyut Samanta <vids@cs.ucla.edu>. Use
<libextractor@gnu.org> to contact the current maintainer(s).
AVAILABILITY
You can obtain the original author’s latest version from
http://gnunet.org/libextractor/.
Jul 14, 2005