mdbTutorialIM - Tutorial of input method

NAME

       mdbTutorialIM - Tutorial of input method

Structure of an input method file

       An input method is defined in a *.mim file with this format.

       (input-method LANG NAME)

       (description (_ "DESCRIPTION"))

       (title "TITLE-STRING")

       (map
         (MAP-NAME
           (KEYSEQ MAP-ACTION MAP-ACTION ...)        <- rule
           (KEYSEQ MAP-ACTION MAP-ACTION ...)        <- rule
           ...)
         (MAP-NAME
           (KEYSEQ MAP-ACTION MAP-ACTION ...)        <- rule
           (KEYSEQ MAP-ACTION MAP-ACTION ...)        <- rule
           ...)
         ...)

       (state
         (STATE-NAME
           (MAP-NAME BRANCH-ACTION BRANCH-ACTION ...)   <- branch
           ...)
         (STATE-NAME
           (MAP-NAME BRANCH-ACTION BRANCH-ACTION ...)   <- branch
           ...)
         ...)

       Lowercase letters and parentheses are literals, so they must be written
       as they are. Uppercase letters represent arbitrary strings.

       KEYSEQ specifies a sequence of keys in this format:

         (SYMBOLIC-KEY SYMBOLIC-KEY ...)

       where SYMBOLIC-KEY is the keysym value returned by the xev command. For
       instance

         (n i)

       represents a key sequence of <<n>> and <>. If all SYMBOLIC-KEYs are
       ASCII characters, you can use the short form

         "ni"

       instead. Consult mdbIM for Non-ASCII characters.

       Both MAP-ACTION and BRANCH-ACTION are a sequence of actions of this
       format:

         (ACTION ARG ARG ...)

       The most common action is [[insert]], which is written as this:

         (insert "TEXT")

       But as it is very frequently used, you can use the short form

         "TEXT"

       If [[’TEXT’]] contains only one character ’C’, you can write it as

         (insert ?C)

       or even shorter as

         ?C

       So the shortest notation for an action of inserting ’a’ is

         ?a

Simple example of capslock

        Here is a simple example of an input method that works as CapsLock.

       (input-method en capslock)
       (description (_ "Upcase all lowercase letters"))
       (title "a->A")
       (map
         (toupper ("a" "A") ("b" "B") ("c" "C") ("d" "D") ("e" "E")
                  ("f" "F") ("g" "G") ("h" "H") ("i" "I") ("j" "J")
                  ("k" "K") ("l" "L") ("m" "M") ("n" "N") ("o" "O")
                  ("p" "P") ("q" "Q") ("r" "R") ("s" "S") ("t" "T")
                  ("u" "U") ("v" "V") ("w" "W") ("x" "X") ("y" "Y")
                  ("z" "Z")))
       (state
         (init (toupper)))

       When this input method is activated, it is in the initial condition of
       the first state (in this case, the only state [[init]]). In the initial
       condition, no key is being processed and no action is suspended. When
       the input method receives a key event <>, it searches branches in the
       current state for a rule that matches <> and finds one in the map
       [[toupper]]. Then it executes MAP-ACTIONs (in this case, just inserting
       A in the preedit buffer). After all MAP-ACTIONs have been executed,
       the input method shifts to the initial condition of the current state.

       The shift to the initial condition of the first state has a special
       meaning; it commits all characters in the preedit buffer then clears
       the preedit buffer.

       As a result, A is given to the application program.

       When a key event does not match with any rule in the current state,
       that event is unhandled and given back to the application program.

       Turkish users may want to extend the above example for  (U+0130:
       LATIN CAPITAL LETTER I WITH DOT ABOVE). It seems that assigning the key
       sequence <> <> for that character is convenient. So, he will add this
       rule in [[toupper]].

           ("ii" "Ä°")

       However, we already have the following rule:

           ("i" "I")

       What will happen when a key event <> is sent to the input method?

       No problem. When the input method receives <>, it inserts I in the
       preedit buffer. It knows that there is another rule that may match the
       additional key event <>. So, after inserting I, it suspends the
       normal behavior of shifting to the initial condition, and waits for
       another key. Thus, the user sees I with underline, which indicates it
       is not yet committed.

       When the input method receives the next <>, it cancels the effects done
       by the rule for the previous i (in this case, the preedit buffer is
       cleared), and executes MAP-ACTIONs of the rule for ii. So,  is
       inserted in the preedit buffer. This time, as there are no other rules
       that match with an additional key, it shifts to the initial condition
       of the current state, which leads to commit .

       Then, what will happen when the next key event is <> instead of <>?

       No problem, either.

       The input method knows that there are no rules that match the <> <> key
       sequence. So, when it receives the next <>, it executes the suspended
       behavior (i.e. shifting to the initial condition), which leads to
       commit I. Then the input method tries to handle <> in the current
       state, which leads to commit A.

       So far, we have explained MAP-ACTION, but not BRANCH-ACTION. The format
       of BRANCH-ACTION is the same as that of MAP-ACTION. It is executed only
       after a matching rule has been determined and the corresponding MAP-
       ACTIONs have been executed. A typical use of BRANCH-ACTION is to shift
       to a different state.

       To see this effect, let us modify the current input method to upcase
       only word-initial letters (i.e. to capitalize). For that purpose, we
       modify the init state as this:

         (init
           (toupper (shift non-upcase)))

       Here [[(shift non-upcase)]] is an action to shift to the new state
       [[non-upcase]], which has two branches as below:

         (non-upcase
           (lower)
           (nil (shift init)))

       The first branch is simple. We can define the new map [[lower]] as the
       following to insert lowercase letters as they are.

       (map
         ...
         (lower ("a" "a") ("b" "b") ("c" "c") ("d" "d") ("e" "e")
                ("f" "f") ("g" "g") ("h" "h") ("i" "i") ("j" "j")
                ("k" "k") ("l" "l") ("m" "m") ("n" "n") ("o" "o")
                ("p" "p") ("q" "q") ("r" "r") ("s" "s") ("t" "t")
                ("u" "u") ("v" "v") ("w" "w") ("x" "x") ("y" "y")
                ("z" "z")))

       The second branch has a special meaning. The map name [[nil]] means
       that it matches with any key event that does not match any rules in the
       other maps in the current state. In addition, it does not consume any
       key event. We will show the full code of the new input method before
       explaining how it works.

       (input-method en titlecase)
       (description (_ "Titlecase letters"))
       (title "abc->Abc")
       (map
         (toupper ("a" "A") ("b" "B") ("c" "C") ("d" "D") ("e" "E")
                  ("f" "F") ("g" "G") ("h" "H") ("i" "I") ("j" "J")
                  ("k" "K") ("l" "L") ("m" "M") ("n" "N") ("o" "O")
                  ("p" "P") ("q" "Q") ("r" "R") ("s" "S") ("t" "T")
                  ("u" "U") ("v" "V") ("w" "W") ("x" "X") ("y" "Y")
                  ("z" "Z") ("ii" "Ä°"))
         (lower ("a" "a") ("b" "b") ("c" "c") ("d" "d") ("e" "e")
                ("f" "f") ("g" "g") ("h" "h") ("i" "i") ("j" "j")
                ("k" "k") ("l" "l") ("m" "m") ("n" "n") ("o" "o")
                ("p" "p") ("q" "q") ("r" "r") ("s" "s") ("t" "t")
                ("u" "u") ("v" "v") ("w" "w") ("x" "x") ("y" "y")
                ("z" "z")))
       (state
         (init
           (toupper (shift non-upcase)))
         (non-upcase
           (lower (commit))
           (nil (shift init))))

       Lets see what happens when the user types the key sequence <> <> <<
       >>. Upon <>, ’A’ is committed and the state shifts to [[non-upcase]].
       So, the next <> is handled in the [[non-upcase]] state. As it matches a
       rule in the map [[lower]], ’b’ is inserted in the preedit buffer and it
       is committed explicitly by the ’commit’ command in BRANCH-ACTION. After
       that, the input method is still in the [[non-upcase]] state. So the
       next << >> is also handled in [[non-upcase]]. For this time, no rule in
       this state matches it. Thus the branch [[(nil (shift init))]] is
       selected and the state is shifted to [[init]]. Please note that << >>
       is not yet handled because the map [[nil]] does not consume any key
       event. So, the input method tries to handle it in the [[init]] state.
       Again no rule matches it. Therefore, that event is given back to the
       application program, which usually inserts a space for that.

       When you type ’a quick blown fox’ with this input method, you get ’A
       Quick Blown Fox’. OK, you find a typo in ’blown’, which should be
       ’brown’. To correct it, you probably move the cursor after ’l’ and type
       <<Backspace>> and <<r>>. However, if the current input method is still
       active, a capital ’R’ is inserted. It is not a sophisticated behavior.

Example of utilizing surrounding text support

        To make the input method work well also in such a case, we must use
       ’surrounding text support’. It is a way to check characters around the
       inputting spot and delete them if necessary. Note that this facility is
       available only with Gtk+ applications and Qt applications. You cannot
       use it with applications that use XIM to communicate with an input
       method.

       Before explaining how to utilize ’surrounding text support’, you must
       understand how to use variables, arithmetic comparisons, and
       conditional actions.

       At first, any symbol (except for several preserved ones) used as ARG of
       an action is treated as a variable. For instance, the commands

         (set X 32) (insert X)

       set the variable [[X]] to integer value 32, then insert a character
       whose Unicode character code is 32 (i.e. SPACE).

       The second argument of the [[set]] action can be an expression of this
       form:

         (OPERAND ARG1 [ARG2])

       Both ARG1 and ARG2 can be an expression. So,

         (set X (+ (* Y 32) Z))

       sets [[X]] to the value of [[Y * 32 + Z]].

       We have the following arithmetic/bitwise OPERANDs (require two
       arguments):

         + - * / & |

       these relational OPERANDs (require two arguments):

         == <= >= < >

       and this logical OPERAND (requires one argument):

         !

       For surrounding text support, we have these preserved variables:

         @-0, @-N, @+N (N is a positive integer)

       The values of them are predefined as below and can not be altered.

       · [[@-0]]
       -1 if surrounding text is supported, -2 if not.

       · [[@-N]]
       The Nth previous character in the preedit buffer. If there are only M
       (M<N) previous characters in it, the value is the (N-M)th previous
       character from the inputting spot.

       · [[@+N]]
       The Nth following character in the preedit buffer. If there are only M
       (M<N) following characters in it, the value is the (N-M)th following
       character from the inputting spot.

       So, provided that you have this context:

         ABC|def|GHI

       (’def’ is in the preedit buffer, two ’|’s indicate borders between the
       preedit buffer and the surrounding text) and your current position in
       the preedit buffer is between ’d’ and ’e’, you get these values:

         @-3 -- ?B
         @-2 -- ?C
         @-1 -- ?d
         @+1 -- ?e
         @+2 -- ?f
         @+3 -- ?G

       Next, you have to understand the conditional action of this form:

         (cond
           (EXPR1 ACTION ACTION ...)
           (EXPR2 ACTION ACTION ...)
           ...)

       where EXPRn are expressions. When an input method executes this action,
       it resolves the values of EXPRn one by one from the first branch. If
       the value of EXPRn is resolved into nonzero, the corresponding actions
       are executed.
       Now you are ready to write a new version of the input method
       ’Titlecase’.

       (input-method en titlecase2)
       (description (_ "Titlecase letters"))
       (title "abc->Abc")
       (map
         (toupper ("a" "A") ("b" "B") ("c" "C") ("d" "D") ("e" "E")
                  ("f" "F") ("g" "G") ("h" "H") ("i" "I") ("j" "J")
                  ("k" "K") ("l" "L") ("m" "M") ("n" "N") ("o" "O")
                  ("p" "P") ("q" "Q") ("r" "R") ("s" "S") ("t" "T")
                  ("u" "U") ("v" "V") ("w" "W") ("x" "X") ("y" "Y")
                  ("z" "Z") ("ii" "Ä°")))
       (state
         (init
           (toupper

            ;; Now we have exactly one uppercase character in the preedit
            ;; buffer.  So, "@-2" is the character just before the inputting
            ;; spot.

            (cond ((| (& (>= @-2 ?A) (<= @-2 ?Z))
                      (& (>= @-2 ?a) (<= @-2 ?z))
                      (= @-2 ?Ä°))

                ;; If the character before the inputting spot is A..Z,
                ;; a..z, or Ä°, remember the only character in the preedit
                ;; buffer in the variable X and delete it.

                (set X @-1) (delete @-)

                ;; Then insert the lowercase version of X.

                (cond ((= X ?Ä°) "i")
                         (1 (set X (+ X 32)) (insert X))))))))

       The above example contains the new action [[delete]]. So, it is time to
       explain more about the preedit buffer. The preedit buffer is a
       temporary place to store a sequence of characters. In this buffer, the
       input method keeps a position called the ’current position’. The
       current position exists between two characters, at the beginning of the
       buffer, or at the end of the buffer. The [[insert]] action inserts
       characters before the current position. For instance, when your preedit
       buffer contains ’ab.c’ (’.’ indicates the current position),

         (insert "xyz")

       changes the buffer to ’abxyz.c’.
       There are several predefined variables that represent a specific
       position in the preedit buffer. They are:

       · [[@<, @=, @>]]
       The first, current, and last positions.

       · [[@-, @+]]
       The previous and the next positions.

       The format of the [[delete]] action is this:

         (delete POS)

       where POS is a predefined positional variable. The above action deletes
       the characters between POS and the current position. So, [[(delete
       @-)]] deletes one character before the current position. The other
       examples of [[delete]] include the followings:

         (delete @+)  ; delete the next character
         (delete @<)  ; delete all the preceding characters in the buffer
         (delete @>)  ; delete all the following characters in the buffer

       You can change the current position using the [[move]] action as below:

         (move @-)  ; move the current position to the position before the
                      previous character
         (move @<)  ; move to the first position

       Other positional variables work similarly.
       Let’s see how our new example works. Whatever a key event is, the input
       method is in its only state, [[init]]. Since an event of a lower letter
       key is firstly handled by MAP-ACTIONs, every key is changed into the
       corresponding uppercase and put into the preedit buffer. Now this
       character can be accessed with [[@-1]].
       How can we tell whether the new character should be a lowercase or an
       uppercase? We can do so by checking the character before it, i.e.
       [[@-2]]. BRANCH-ACTIONs in the [[init]] state do the job.
       It first checks if the character [[@-2]] is between A to Z, between a
       to z, or Ä° by the conditional below.

            (cond ((| (& (>= @-2 ?A) (<= @-2 ?Z))
                      (& (>= @-2 ?a) (<= @-2 ?z))
                      (= @-2 ?Ä°))

       If not, there is nothing to do specially. If so, our new key should be
       changed back into lowercase. Since the uppercase character is already
       in the preedit buffer, we retrieve and remember it in the variable
       [[X]] by

           (set X @-1)

       and then delete that character by

           (delete @-)

       Lastly we re-insert the character in its lowercase form. The problem
       here is that ’Ä°’ must be changed into ’i’, so we need another
       conditional. The first branch

           ((= X ?Ä°) "i")

       means that ’if the character remembered in X is ’Ä°’, ’i’ is inserted’.
       The second branch

           (1 (set X (+ X 32)) (insert X))

       starts with ’1’, which is always resolved into nonzero, so this branch
       is a catchall. Actions in this branch increase [[X]] by 32, then insert
       [[X]]. In other words, they change A...Z into a...z respectively and
       insert the resulting lowercase character into the preedit buffer. As
       the input method reaches the end of the BRANCH-ACTIONs, the character
       is commited.
       This new input method always checks the character before the current
       position, so ’A Quick Blown Fox’ will be successfully fixed to ’A Quick
       Brown Fox’ by the key sequence <<BackSpace>> <<r>>.

COPYRIGHT

       Copyright (C) 2001 Information-technology Promotion Agency (IPA)
       Copyright (C) 2001-2008 National Institute of Advanced Industrial
       Science and Technology (AIST)
       Permission is granted to copy, distribute and/or modify this document
       under the terms of the GNU Free Documentation License
       <http://www.gnu.org/licenses/fdl.html>.

                                  23 Jun 2008                 mdbTutorialIM(5)