NAME
sgml-spell-checker - SGML spell checker
SYNOPSIS
nsgmls -l yourdoc.sgml | sgml-spell-checker [option] ...
DESCRIPTION
sgml-spell-checker is a tool that you can use to automatically spell-
check your SGML documents. One of the advantages of this tool over
some other SGML-aware spell checkers is that it scans your documents in
the form in which the SGML parser actually sees it, which means it is
not line-based, system entities are resolved, marked sections are
treated appropriately, etc.
Also, this tool can be made aware of particular DTDs, in the sense that
it knows not to spell-check the content of elements that do not
represent human-language text, such as <programlisting> in DocBook. An
exclusion list for the DocBook DTD is included, others can be added
trivially.
The input to sgml-spell-checker is the text representation of your SGML
document’s Element Structure Information Set as generated by nsgmls
(from SP or OpenSP; sometimes installed under the name onsgmls). In
other words, you need to pipe the output of nsgmls into sgml-spell-
checker as shown in the synopsis. Provide to nsgmls the options you
need, such as -c to search more catalogs, -i to include a marked
section, or more source files. Do not forget the -l option, or you
won’t get any file or line references for the misspellings.
The second part of the pipe takes a couple of options; see below. Note
that if the language of the document does not match your system’s
locale settings, you need to use the --language option.
The output of sgml-spell-checker is a list of the words that are
misspelled (in the opinion of aspell), together with file name and line
number. Note that the line number designates where the element that
contains the word started, not where the word actually is. So most
likely you will have to search a few lines below the indicated
location.
OPTIONS
--debug
Debug mode. Generates lots of output not of interest to the
normal user.
--language=language
Sets the language of the document. (The format depends on the
aspell installation, but something like en or en_US should
work.) By default the language is taken from the system locale
settings.
--suggestions
Shows correction suggestions for misspelled words.
--dictionary=file
Uses an additional aspell dictionary file. This option may be
used multiple times.
--dtd=dtd
Uses the exclusion list for the specified DTD (e.g., docbook).
--help Shows a brief help, then exits.
EXAMPLES
nsgmls -l -D . mydoc.sgml | \
sgml-spell-checker --language=en --dtd=docbook \
--dictionary=mydict1.aspell --dictionary=mydict2.aspell
(You can enter this command all on one line without the backslashes, or
on several lines with the backslashes.)
NOTES
Read the aspell documentation about how to set up the appropriate
dictionaries. In case you’re having trouble interpreting the aspell
documentation, here’s how to make an aspell dictionary file from a flat
word list:
rm -f mydict1.aspell # aspell won’t overwrite existing files
aspell --language-tag=xx create master ./mydict1.aspell < mywordlist.txt
Watch the slashes. aspell likes to see a slash in the name or it will
search some default location.
BUGS
This program should be able to identify the language from the document
(e.g., <book lang="de">), but aspell doesn’t handle changing the
language on the fly.
AUTHOR
Peter Eisentraut (peter_e@gmx.net)