hspell - Hebrew spellchecker (C API)

NAME

       hspell - Hebrew spellchecker (C API)

SYNOPSIS

       #include <hspell.h>

       int hspell_init(struct dict_radix **dictp, int flags);

       void hspell_uninit(struct dict_radix *dictp);

       int  hspell_check_word(struct  dict_radix  *dict, const char *word, int
       *preflen);

       void  hspell_trycorrect(struct  dict_radix  *dict,  const  char  *word,
       struct corlist *cl);

       int corlist_init(struct corlist *cl);

       int corlist_free(struct corlist *cl);

       int corlist_n(struct corlist *cl);

       char *corlist_str(struct corlist *cl, int i);

       int hspell_is_canonic_gimatria(const char *word);

       typedef  int  hspell_word_split_callback_func(const  char  *word, const
       char *baseword, int preflen, int prefspec);

       int  hspell_enum_splits(struct  dict_radix  *dict,  const  char  *word,
       hspell_word_split_callback_func *enumf);

       void hspell_set_dictionary_path(const char *path);

       const char *hspell_get_dictionary_path(void);

DESCRIPTION

       This  manual  describes  the  C  API of the Hspell Hebrew spellchecker.
       Please refer to hspell(1) for a description of the Hspell project,  its
       spelling standard, and how it works.

       The  hspell_init()  function  must  be  called  first to initialize the
       Hspell library. It sets up some global structures (see CAVEATS section)
       and  then  reads the necessary dictionary files (whose places are fixed
       when the library is built). The dictp parameter is  a  pointer  to  a
       struct  dict_radix*  object,  which  is  modified  to  point to a newly
       allocated dictionary.  A typical  hspell_init()  call  therefore  looks
       like

          struct dict_radix *dict;
          hspell_init(&dict, flags);

       Note  that  the  (struct  dict_radix*)  type is an opaque pointer - the
       library user has no access to the separate fields in this structure.

       The flags parameter can contain a bitwise  or’ing  of  several  flags
       that  modify Hspell’s default behavior; Turning on HSPELL_OPT_HE_SHEELA
       allows Hspell to recognize the interrogative He prefix (he ha-she’ela).
       HSPELL_OPT_DEFAULT  is  a synonym for turning on no special flag, i.e.,
       it evaluates to 0.

       hspell_init() returns 0 on success,  or  negative  numbers  on  errors.
       Currently, the only error is -1, meaning the dictionary files could not
       be read.

       The hspell_uninit()  function  undoes  the  effects  of  hspell_init(),
       freeing any memory that was allocated during initialization.

       The  hspell_check_word()  function  checks  whether a certain word is a
       correct Hebrew word (possibly  with  prefix  particles  attached  in  a
       syntacticly-correct manner). 1 is returned if the word is correct, or 0
       if it is incorrect.

       The word parameter should be a single Hebrew word, in  the  iso8859-8
       encoding,   possibly   containing   the  ASCII  quote  or  double-quote
       characters (signifying the geresh and  gershayim  used  in  Hebrew  for
       abbreviations,  acronyms,  and  a  few  foreign sounds). If the calling
       programs works with other  encodings,  it  must  convert  the  word  to
       iso8859-8  first. In particular cp1255 (the MS-Windows Hebrew encoding)
       extensions to iso8859-8 like niqqud characters,  geresh  or  gershayim,
       are currently not recognized and must be removed from the word prior to
       calling hspell_check_word().

       Into the preflen parameter, the function writes back  the  number  of
       characters  it recognized as a prefix particle - the rest of the ’word’
       is a stand-alone word.  Because Hebrew words typically can be  read  in
       several  different  ways, this feature (of getting just one prefix from
       one possible reading) is usually not very useful, and it is  likely  to
       be removed in a future version.

       The  hspell_enum_splits()  function  provides a way to get all possible
       splitting of the given word into an optional prefix  particle  and  a
       stand-alone  word.   For each possible (and legal, as some words cannot
       accept certain prefixes) split, a  user-defined  callback  function  is
       called.  This  callback function is given the whole word, the length of
       the prefix, the stand-alone word, and a bitfield which  describes  what
       types  of  words  this prefix can get.  Note that in some cases, a word
       beginning with the letter waw gets this waw doubled before a prefix, so
       sometimes strlen(word)!=strlen(baseword)+preflen.

       The  hspell_trycorrect()  tries  to find a list of possible corrections
       for an incorrect word.  Because in Hebrew the word density is  high  (a
       random  string  of letters, especially if short, has a high probability
       of being a correct word), this function  attempts  to  try  corrections
       based  on  the  assumption  of a spelling error (replacement of letters
       that sound alike, missing or spurious immot qri’a), not  typo  (slipped
       finger on the keyboard, etc.) - see also CAVEATS.

       hspell_trycorrect()  returns  the  correction  list into a structure of
       type struct corlist.  This structure must be  first  allocated  with  a
       call to corlist_init() and subsequently freed with corlist_free().  The
       corlist_n() macro returns the number of  words  held  in  an  allocated
       corlist,  and corlist_str() returns the i’th word. Accordingly, here is
       an example usage of hspell_trycorrect():

          struct corlist cl;
          printf ("Found misspelled word %s. Possible corrections:\n", w);
          corlist_init (&cl);
          hspell_trycorrect (dict, w, &cl);
          for (i=0; i<corlist_n(&cl); i++) {
              printf ("%s\n", corlist_str(&cl, i));
          }

       The hspell_is_canonic_gimatria() function checks whether the given word
       is  a  canonic gimatria - i.e., the proper way to write in gimatria the
       number it represents. The caller might want to accept canonic  gimatria
       as proper Hebrew words, even if hspell_check_word() previously reported
       such word to  be  a  non-existent  word.   hspell_is_canonic_gimatria()
       returns  the  number  represented as gimatria in ’word’ if it is indeed
       proper gimatria (in canonic form), or 0 otherwise.

       hspell_init() normally reads the dictionary files from a path  compiled
       into  the  library.  This  makes  sense when the library’s code and the
       dictionaries are  distributed  together,  but  in  some  scenarios  the
       library user might want to use the Hspell dictionaries that are already
       present  on  the  system   in   an   arbitrary   path.   The   function
       hspell_set_dictionary_path()  can  be used to set this path, and should
       be used before calling hspell_init().  The given path is  that  of  the
       word  list,  and  other  input  files  have  that path with an appended
       prefix.  hspell_get_dictionary_path() can be used to find  the  current
       path.      On      many     installations,     this     defaults     to
       "/usr/local/share/hspell/hebrew.wgz".

LINKING

       On most systems, the Hspell library is compiled to use the Zlib library
       for  reading  the compressed dictionaries. Therefore, a program linking
       with the Hspell library must also  be  linked  with  the  Zlib  library
       (usually, by adding "-lz" to the compilation line).

       Programs  that  use  autoconf  to search for the Hspell library, should
       remember to tell AC_CHECK_LIB to also link with the  -lz  library  when
       checking for -lhspell.

CAVEATS

       While  the  API described here has been stable for years, it may change
       in the future. Users are  encouraged  to  compare  the  values  of  the
       integer  macros  HSPELL_VERSION_MAJOR and HSPELL_VERSION_MINOR to those
       expected   by   the   writer   of   the   program.   A   third   macro,
       HSPELL_VERSION_EXTRA  contains  a  string which can describe subrelease
       modifications (e.g., beta versions).

       The current Hspell C API is very low-level, in the sense that it leaves
       the  user  to  implement many features that some users take for granted
       that a spell-checker should provide. For example it doesn’t provide any
       facilities for a user-defined personal dictionary. It also has separate
       functions for checking valid Hebrew words and valid  gimatria,  and  no
       function  to  do  both. It is assumed that the caller - a bigger spell-
       checking library or word processor (for  example),  will  already  have
       these  facilities.  If  not,  you  may  wish  to look at the sources of
       hspell(1) for an example implementation.

       Currently there is no concept  of  separate  Hspell  "contexts"  in  an
       application.   Some  of  the  context  is  now  global  for  the entire
       application: currently, a single  list  of  legal  prefix-particles  is
       kept,  and the dictionary read by hspell_init() is always read from the
       global default place. This may be solved in a later version,  e.g.,  by
       switching to an API like:

          context = hspell_new_context();
          hspell_set_dictionary_path(context, "/some/path/hebrew.wgz");
          hspell_init(context, flags);
          ...
          hspell_check_word(context, word, preflenp);

       Note   that   despite   the   global  context  mentioned  above,  after
       initialization all functions described here  are  thread-safe,  because
       they only read the dictionary data, not write to it.

       hspell_trycorrect()  is  not  as  powerful  as it could have been, with
       typos  or  certain  kinds  of  spelling  mistakes  not  giving   useful
       correction   suggestions.   Along   with  more  types  of  corrections,
       hspell_trycorrect() needs a better way to order the likelihood  of  the
       corrections,  as  an unordered list of 100 corrections would be just as
       useful (or rather, useless) as none.

       In some cases of errors  during  hspell_init(),  warning  messages  are
       printed  to  the  standard errors. This is a bad thing for a library to
       do.

       There are too many CAVEATS in this manual.

VERSION

       The version of hspell described by this manual page  is  1.1  (December
       31, 2009)

COPYRIGHT

       Copyright (C) 2000-2009, Nadav Har’El <nyh@math.technion.ac.il> and Dan
       Kenigsberg <danken@cs.technion.ac.il>.

       Hspell is free software, released under the GNU General Public  License
       (GPL).   Note  that not only the programs in the distribution, but also
       the dictionary files and the generated word lists, are  licensed  under
       the GPL.  There is no warranty of any kind.

       See  the LICENSE file for more information and the exact license terms.

       The   latest   version   of   this   software   can   be    found    in
       http://hspell.ivrix.org.il/

NAME

SYNOPSIS

DESCRIPTION

LINKING

CAVEATS

VERSION

COPYRIGHT

SEE ALSO