Man Linux: Main Page and Category List

NAME

       libextractor - meta-information extraction library 0.5.11

SYNOPSIS

       #include <extractor.h>

        typedef struct EXTRACTOR_Keywords {
          char * keyword;
          EXTRACTOR_KeywordType keywordType;
          struct EXTRACTOR_Keywords * next;
        } EXTRACTOR_KeywordList;

        EXTRACTOR_ExtractorList * EXTRACTOR_loadDefaultLibraries ();

        const      char      *     EXTRACTOR_getKeywordTypeAsString     (const
       EXTRACTOR_KeywordType type);

        EXTRACTOR_ExtractorList        *         EXTRACTOR_loadConfigLibraries
       (EXTRACTOR_ExtractorList * prev, const char * config);

        EXTRACTOR_ExtractorList             *             EXTRACTOR_addLibrary
       (EXTRACTOR_ExtractorList * prev, const char * library);

        EXTRACTOR_ExtractorList           *           EXTRACTOR_addLibraryLast
       (EXTRACTOR_ExtractorList * prev, const char * library);

        EXTRACTOR_ExtractorList            *           EXTRACTOR_removeLibrary
       (EXTRACTOR_ExtractorList * prev, const char * library);

        void EXTRACTOR_removeAll (EXTRACTOR_ExtractorList * prev);

        EXTRACTOR_KeywordList * EXTRACTOR_getKeywords (EXTRACTOR_ExtractorList
       * extractor, const char * filename);

        EXTRACTOR_KeywordList * EXTRACTOR_getKeywords (EXTRACTOR_ExtractorList
       * extractor, const char * data, size_t size);

        EXTRACTOR_KeywordList         *          EXTRACTOR_removeEmptyKeywords
       (EXTRACTOR_KeywordList * list);

        EXTRACTOR_KeywordList        *       EXTRACTOR_removeDuplicateKeywords
       (EXTRACTOR_KeywordList * list, const unsigned int options);

        void EXTRACTOR_printKeywords (FILE * handle,  EXTRACTOR_KeywordList  *
       keywords);

        void EXTRACTOR_freeKeywords (EXTRACTOR_KeywordList * keywords);

        const  char  *  EXTRACTOR_extractLast  (const  EXTRACTOR_KeywordType *
       type, EXTRACTOR_KeywordList * keywords);

        const  char  *  EXTRACTOR_extractLastByString  (const  char  *   type,
       EXTRACTOR_KeywordList * keywords);

        unsigned    int   EXTRACTOR_countKeywords   (EXTRACTOR_KeywordList   *
       keywords);

        EXTRACTOR_DEFAULT_LIBRARIES

        EXTRACTOR_VERSION

DESCRIPTION

       libextractor is a simple library for keyword extraction.   libExtractor
       does  not  support all formats but supports a simple plugging mechanism
       such that you can quickly add extractors for additional  formats,  even
       without  recompiling  libExtractor.   libExtractor typically ships with
       one or more helper-libraries that can be used to obtain  keywords  from
       common  file-types.   If  you want to write your own extractor for some
       filetype, all you need to do is write a little library that  implements
       a single method with this signature:

        EXTRACTOR_KeywordList * LIBRARYNAME_extract(const char * filename,
                                                    char * data,
                                                    size_t size,
                                                    EXTRACTOR_KeywordList    *
       prev);

       The filename is the name of the file, data is a pointer to the contents
       of  the file and size is the size of the file.  The extract method must
       prepend keywords that it finds to the linked list ’prev’ and return the
       new  head.  The library must allocate (malloc) the entry in the keyword
       list and the memory for the filename since  both  will  be  free’ed  by
       libExtractor  once  the  application  calls  freeKeywords.  An  example
       implementation can be found in mp3extractor.c.  The application extract
       gives an example how to use libExtractor.

       The  basic use of libextractor is to load the plugins (for example with
       EXTRACTOR_loadDefaultLibraries), then to extract the keyword list using
       EXTRACTOR_getKeywords,  processing the list (using application specific
       code and possibly some of the postprocessing convenience functions like
       EXTRACTOR_removeDuplicateKeywords),  freeing  the  keyword  list (using
       EXTRACTOR_freeKeywords)  and  finally  unloading  the   plugins   (with
       EXTRACTOR_removeAll).

       The  keywords  obtained  from  libextractor  are  supposed  to be UTF-8
       encoded.   The  EXTRACTOR_printKeywords  function  converts  the  UTF-8
       keywords  to  the character set from the current locale before printing
       them.  Plugins are supposed to convert meta-data to UTF-8 if necessary.

SEE ALSO

       extract(1)

LEGAL NOTICE

       libextractor   is   released   under   the   GPL   and  a  GNU  project
       (http://www.gnu.org/).

BUGS

       A couple of file-formats (on the order of 10^3) are not recognized...

AUTHORS

       extract    was    originally    written    by    Christian     Grothoff
       <christian@grothoff.org>  and  Vidyut  Samanta  <vids@cs.ucla.edu>. Use
       <libextractor@gnu.org> to contact the current maintainer(s).

AVAILABILITY

       You  can   obtain   the   original   author’s   latest   version   from
       http://gnunet.org/libextractor/.

                                 Jul 14, 2005