g2p-sk - phonetic transcription for Slovak
g2p-sk [--color] [--dl debug level] [--help] [--stats] [--ofile
<file_name>] [<input file>]
The phonetic transcription is essential for some linguistic or speech
recognition applications. Depending on the language either rule based
or statistical approach is being used. g2p-sk implements the rule based
approach but in the future it may be replaced by statistical one.
Each input word consisting of the sequence of graphemes is transcribed
in to the sequence of phones in the SAMPA coding. If no input file is
specified, the standard input is expected. If input file is used then
the output is written in to the file as well. The filename is input
filename with the extension "_trans.txt".
The input output code page is ISO 8859-2. To use it with different CP
use some CP converter and pipes. For example to have input and output
in UTF-8 use (for interactive use): filterm UTF8-iso2 iso2-UTF8 g2p-sk
or (for batch processing) iconv -f UTF-8 -t ISO_8859-2 | g2p-sk | iconv
-f ISO_8859-2 -t UTF-8
Performance of the phonetic transcription depend on the morphematic
segmentation. To improve the quality of the morphematic segmentation is
possible to replace the small version of the simple morphematic
dictionary in the /usr/share/g2p_sk/Exceptions/morfemy.ddat with the
better one. The syllabic segmentation is as important as morphematic
one. The syllabic segmentation is provided by sylseg-sk package.
The design of the g2p-sk is language dependent. To use it for another
language the all rules need to be rewritten.
Enable color output.
Set the debug level. Control the amount of displayed information
The debug level 0 displays nothing. The maximum level 5 displays
full debugging report. The default debug level is 1.
--help Display a short help text
Write output also in to given file.
Count and display statistic for each phone
Use standard input and debug level 3:
g2p-sk --dl 3
Process all the from file aaa.txt:
g2p-sk returns a zero if it succeeds to process all the input words
Jozef Ivanecky (dodo (at) kanoistika.sk)
sylseg-sk(1), filterm(1), iconv(1), konwert(1)