lt-proc - This application is part of the lexical processing modules
and tools ( lttoolbox )
This tool is part of the apertium machine translation architecture:
lt-proc [ -a | -g | -n | -p | -s | -v | -h ] fst_file [input_file
lt-proc [ --analysis | --generation | --non-marked-gen | --post-
generation | --sao | --version | --help ] fst_file [input_file
lt-proc is the application responsible of providing the four lexical
· morphological analyser ( option -a )
· lexical transfer ( option -n )
· morphological generator ( option -g )
· post-generator ( option -p )
It accomplishes these tasks by reading binary files containing a
compact and efficient representation of dictionaries (a class of
finite-state transducers called augmented letter transducers). These
files are generated by lt-comp(1).
It is worth to mention that some characters (‘[’, ‘]’, ‘$’, ‘^’, ‘/’,
‘+’) are special chars used for format and encapsulation. They should
be escaped if they have to be used literally, for instance: ‘[’...‘]’
are ignored and the format of a linefeed is ‘^...$’.
Tokenizes the text in surface forms (lexical units as they
appear in texts) and delivers, for each surface form, one or
more lexical forms consisting of lemma, lexical category and
morphological inflection information. Tokenization is not
straightforward due to the existence, on the one hand, of
contractions, and, on the other hand, of multi-word lexical
units. For contractions, the system reads in a single surface
form and delivers the corresponding sequence of lexical forms.
Multi-word surface forms are analysed in a left-to-right,
longest-match fashion. Multi-word surface forms may be
invariable (such as a multi-word preposition or conjunction) or
inflected (for example, in es, "echaban de menos", "they
missed", is a form of the imperfect indicative tense of the verb
"echar de menos", "to miss"). Limited support for some kinds of
discontinuous multi-word units is also available. Single-word
surface forms analysis produces output like the one in these
examples: "cantar" -> ‘^cantar/cantar<vblex><inf>$’ or "cantaba"
Delivers a target-language surface form for each target-language
lexical form, by suitably inflecting it.
Morphological generation (like -g) but without unknown word
marks (asterisk ‘*’).
Performs orthographical operations such as contractions and
apostrophations. The post-generator is usually dormant (just
copies the input to the output) until a special alarm symbol
contained in some target-language surface forms wakes it up to
perform a particular string transformation if necessary; then it
goes back to sleep.
Input processing is in orthoepikon (previosuly ‘sao’) annotation
system format: http://orthoepikon.sf.net.
Display the version number.
Display this help.
input_file The input compiled dictionary.
lt-expand(1), lt-comp(1), apertium-tagger(1), apertium-translator(1).
Lots of...lurking in the dark and waiting for you!
(c) 2005,2006 Universitat d’Alacant / Universidad de Alicante. All