NAME
hspell - Hebrew spellchecker
SYNOPSIS
hspell [ -acDhHilnsvV ] [file...]
DESCRIPTION
hspell tries to find incorrectly spelled Hebrew words in its input
files.
Like the traditional Unix spell(1), hspell outputs the sorted list of
incorrect words, and does not (yet) have a more friendly interface for
making corrections for you. However, unlike spell(1), hspell can
suggest possible corrections for some spelling errors - such
suggestions are enabled with the -c (correct) and -n (notes) options.
Hspell currently expects ISO-8859-8-encoded input files. Non-Hebrew
characters in the input files are ignored, allowing the easy
spellchecking of Hebrew-English texts, as well as HTML or TeX files.
If files using a different encoding (e.g., UTF8) are to be checked,
they must be converted first to ISO-8859-8 (e.g., see iconv(1),
recode(1)).
The output will also be in ISO-8859-8 encoding, in so-called "logical
order", so it is normally useful to pipe it to bidiv(1) before viewing,
as in:
hspell -c filename | bidiv | less
If no input file is given, hspell reads from its standard input.
OPTIONS
-v If the -v option is given, hspell prints emacs-oriented version
information and exits.
-vv Repetition of the -v option causes hspell to also show some
information on which optional features were enabled at compile
time.
-V With the -V option, hspell prints true and human-oriented
version information and exits.
-c If the -c option is given, hspell will suggest corrections for
misspelled words, whenever it can find such corrections. The
correction mechanism in this release is especially good at
finding corrections for incorrect niqqud-less spellings, with
missing or extra ’immot-qri’a.
-n The -n option will give some longer "notes" about certain
spelling errors, explaining why these are indeed errors (or in
what cases using this word is in fact correct). It is recommend
to combine the two options, -cn for maximal correction help from
hspell.
-l The -l (linguistic information) option will explain for each
correct word why it was recognized (show the basic noun, verb,
etc., that this inflection relates to, and its tense, gender,
associated Kinnuy, or other relevant information)
If Hspell was built without morphological analysis support, this
option will only show the correct splits of the given word into
prefix + word, as the full information incurs a 4-fold increase
in the installation size.
Giving the -c option in addition to -l results in special
behavior. In that case hspell suggests "corrections" to every
word (regardless if they are in the dictionary or not), and
shows the linguistic information on all those words. This can be
useful for a reader application, which may also want to be able
to understand misspellings and their possible meanings.
-s Normally, the words deemed spelling mistakes are shown in
alphabetical order. The -s option orders them by severity,
i.e., the errors that most frequently appear in the document are
shown first. This option is most useful for people helping to
build hspell’s word list, and are looking for common correct
words that hspell does not know yet.
-a With the -a option, hspell tries to emulate (as little as
possible of) ispell’s pipe interface. This allows Lyx, Emacs,
Geresh and KDE to use hspell as an external spell-checker.
-i This option only has any effect when used together with the -a
option. Normally, hspell -a only checks the spelling of Hebrew
words. If the given file also contains non-Hebrew words (such as
English words), these are simply ignored. Adding the -i option
tells hspell to pass the non-Hebrew words to ispell(1), and
return its answer as an answer from hspell. This allows
conveniently spell-checking mixed Hebrew-English documents.
Running hspell with the program name hspell-i also enables the
-i option. This is a useful trick when an application expects
just the name of a spell-checking program, and adds only the
"-a" option (without giving the user an option to also add
"-i"). The multispell script supplied with hspell serves a
similar purpose, with more control over encodings and which
spell-checker to run for non-Hebrew words.
-H By default, Hspell does not allow the He Ha-sh’ela prefix. This
is because this prefix is not normally used in modern Hebrew,
and generates many false-negatives (errors, like He followed by
a possessed noun, are thought to be correct). The -H option
nevertheless tells Hspell to allow this prefix.
-D base
Load the word lists from the given base pathname, rather than
from the compiled-in default path. This is mostly used for
testing Hspell, when the dictionaries have been compiled in the
current directory and hspell is run as "hspell -Dhebrew.wgz".
-d, -B, -m, -T, -C, -S, -P, -p, -w, and -W
These options are passed to hspell by lyx or other applications,
and are cordially ignored.
SPELLING STANDARD
Hspell was designed to be 100% and strictly compliant with the official
niqqud-less spelling rules ("Ha-ktiv Khasar Ha-niqqud", colloquially
known as "Ktiv Male") published by the Academy of the Hebrew Language.
This is both an advantage and a disadvantage, depending on your
viewpoint. It’s an advantage because it encourages a correct and
consistent spelling style throughout your writing. It is a
disadvantage, because a few of the Academia’s official spelling
decisions are relatively unknown to the general public.
Users of Hspell (and all Hebrew writers, for that matter) are
encouraged to read the Academia’s official niqqud-less spelling rules
(which are printed at the end of most modern Hebrew dictionaries, and
an abridged version is available in
http://hebrew-academy.huji.ac.il/decisio.html). Users are also
encouraged to refer to Hebrew dictionaries which use the niqqud-less
spelling (such as Millon Ha-hove, Rav Milim, and the new Even Shoshan).
Hspell’s distribution (and Web site) also include a document,
niqqudless.odt, which explains Hspell’s spelling standard in detail (in
Hebrew). It explains both the overall principles, and why specific
words are spelled the way they are.
Future releases might include an option for alternative spelling
standards.
BEHIND THE SCENES
The hspell program itself is mostly a simple (but efficient) program
that checks input words against a long list of valid words. The real
"brains" behind it are the word lists (dictionary) provided by the
Hspell project.
In order for this dictionary to be completely free of other people’s
copyright restrictions, the Hspell project is a clean-room
implementation, not based on pre-existing word lists or spell checkers,
or on copying of printed dictionaries.
The word list is also not based on automatic scanning of available
Hebrew documents (such as online newspapers), because there is no way
to guarantee that such a list will be correct (not contain
misspellings, useless proper names, and so on), complete (certain
inflections might not appear in the chosen samples), or consistent
(especially when it comes to niqqud-less spelling rules).
Instead, our idea was to write programs which know how to correctly
inflect Hebrew nouns and conjugate Hebrew verbs. The input to these
programs is a list of noun stems and verb roots, plus hints needed for
the correct inflection when these cannot be figured out automatically.
Most of the effort that went into the Hspell project went into building
these input files. Then, "word list generators" (written in Perl, and
are also part of the Hspell project) create the complete inflected word
list that will be used by the spellchecking program, hspell. This
generation process is only done once, when building hspell from source.
These lists, before and after inflection, may be useful for much more
than spellchecking. Morphological analysis (which hspell provides with
the -l option) is one example. For more ideas, see Hspell project’s Web
site, at http://ivrix.org.il/projects/spell-checker.
FILES
~/.hspell_words, ./hspell_words
These files, if they exist, should contain a list of Hebrew
words that hspell will also accept as correct words.
Note that only these words exactly will be added - they are not
inflected, and prefixes are not automatically allowed.
/usr/local/share/hspell/*
The standard Hebrew word lists used by hspell.
EXIT STATUS
Currently always 0.
VERSION
The version of hspell described by this manual page is 1.1 (December
31, 2009)
COPYRIGHT
Copyright (C) 2000-2009, Nadav Har’El <nyh@math.technion.ac.il> and Dan
Kenigsberg <danken@cs.technion.ac.il>.
Hspell is free software, released under the GNU General Public License
(GPL). Note that not only the programs in the distribution, but also
the dictionary files and the generated word lists, are licensed under
the GPL. There is no warranty of any kind.
See the LICENSE file for more information and the exact license terms.
The latest version of this software can be found in
http://hspell.ivrix.org.il/
ACKNOWLEDGMENTS
The hspell utility and the linguistic databases behind it (collectively
called "the Hspell project") were created by Nadav Har’El
<nyh@math.technion.ac.il> and by Dan Kenigsberg
<danken@cs.technion.ac.il>.
Although we wrote all of Hspell’s code ourselves, we are truly indebted
to the old-style "open source" pioneers - people who wrote books
instead of hiding their knowledge in proprietary software. For the
correct noun inflections, Dr. Shaul Barkali’s "The Complete Noun Book"
has been a great help. Prof. Uzzi Ornan’s booklet "Verb Conjugation in
Flow Charts" has been instrumental in the implementation of verb
conjugation, and Barkali’s "The Complete Verb Book" was used too.
During our work we have extensively used a number of Hebrew
dictionaries, including Even Shoshan, Millon Ha-hove and Rav-Milim, to
ensure the correctness of certain words. Various Hebrew newspapers and
books, both printed and online, were used for inspiration and for
finding words we still do not recognize.
We wish to thank Cilla Tuviana and Dr. Zvi Har’El for their assistance
with some grammatical questions.
Several other people helped us in various releases, with suggestions,
fixes or patches - they are listed in the WHATSNEW file in the
distribution.
SEE ALSO
hspell(3), spell(1), ispell(1), bidiv(1), iconv(1), recode(1)
BUGS
This manual page is in English.
For GUI-lovers, hspell’s user interface is an abomination. However, as
more and more applications learn to interface with hspell, and as
Hspell’s data becomes available in multi-lingual spellcheckers (such as
aspell and hunspell), this will no longer be an issue. See
http://hspell.ivrix.org.il/ for instructions on how to use Hspell in a
variety of applications.
hspell’s being limited to the ISO-8859-8 encoding, and not recognizing
UTF-8 or even CP1255 (including niqqud), is almost an anachronism
today.