Man Linux: Main Page and Category List

NAME

   lookup - interactive file search and display

SYNOPSIS

   lookup [ args ] [ file ...  ]

DESCRIPTION

   Lookup  allows  the  quick  interactive  search of text files.  It supports
   ASCII, JIS-ROMAN, and  Japanese  EUC  Packed  formated  text,  and  has  an
   integrated romaji¢ªkana converter.

THIS MANUAL

   Lookup  is  flexible  for  a  variety  of  applications.  This manual will,
   however, focus on the application of searching Jim Breen’s edict (Japanese-
   English  dictionary) and kanjidic (kanji database). Being familiar with the
   content and format of these files would be helpful. See  the  INFO  section
   near  the  end  of this manual for information on how to obtain these files
   and their documentation.

OVERVIEW OF MAJOR FEATURES

   The following just mentions some major features to whet  your  appetite  to
   actually read the whole manual (-:

   Romaji-to-Kana Converter
      Lookup  can  convert  romaji  to  kana for you, even¡Èon the fly¡Éas you
      type.

   Fuzzy Searching
      Searches can be a bit¡Èvague¡Éor¡Èfuzzy¡É, so that  you’ll  be  able  to
      find¡ÈÅìµþ¡Éeven  if you try to search for¡È¤È¤­¤ç¡É(the proper yomikata
      being¡È¤È¤¦¤­¤ç¤¦¡É).

   Regular Expressions
      Uses the powerful and expressive regular expression for  searching.  One
      can  easily specify complex searches that affect¡ÈI want lines that look
      like such-and-such, but not like this-and-that, but that also have  this
      particular characteristic....¡É

   Wildcard ‘‘Glob’’ Patterns
      Optionally,  can  use  well-known  filename wildcard patterns instead of
      full-fledged regular expressions.

   Filters
      You can have lookup not list certain lines that  would  otherwise  match
      your search, yet can optionally save them for quick review. For example,
      you could have all name-only entries from  edict  filtered  from  normal
      output.

   Automatic Modifications
      Similarly, you can do a standard search-and-replace on lines just before
      they print, perhaps to remove information you don’t care to see on  most
      searches.  For example, if you’re generally not interested in kanjidic’s
      info on Chinese readings, you can have them removed  from  lines  before
      printing.

   Smart Word-Preference Mode
      You  can  have lookup list only entries with whole words that match your
      search    (as    opposed    to    an    embedded    match,    such    as
      finding¡Èthe¡Éinside¡Èthem¡É),  but if no whole-word matches exist, will
      go ahead and list any entry that matches the search.

   Handy Features
      Other handy features include a dynamically  settable  and  parameterized
      prompt,  automatic  highlighting  of  that part of the line that matches
      your search,  an  output  pager,  readline-like  input  with  horizontal
      scrolling  for  long  input  lines,  a¡È.lookup¡Éstartup file, automated
      programability, and much more. Read on!

REGULAR EXPRESSIONS

   Lookup makes liberal use of regular expressions (or  regex  for  short)  in
   controlling  various  aspects of the searches. If you are not familiar with
   the important concepts of regexes,  read  the  tutorial  appendix  of  this
   manual before continuing.

JAPANESE CHARACTER ENCODING METHODS

   Internally,  lookup  works  with  Japanese packed-format EUC, and all files
   loaded must be encoded similarly. If you  have  files  encoded  in  JIS  or
   Shift-JIS,  you must first convert them to EUC before loading (see the INFO
   section for programs that can do this).

   Interactive input and output encoding, however, may be be selected via  the
   -jis,  -sjis,  and  -euc  invocation flags (default is -euc), or by various
   commands to the program (described later).

   Make sure to use the encoding appropriate for your system.  If you’re using
   kterm  under  the  X Window System, you can use lookup’s -jis flag to match
   kterm’s default JIS encoding. Or, you might use  kterm’s¡È-km  euc¡Éstartup
   option  (or  menu selection) to put kterm into EUC mode. Also, I have found
   kterm’s scrollbar (¡È-sb -sl 500¡É) to be quite useful.

   With many¡ÈEnglish¡Éfonts in Japan, the character that normally prints as a
   backslash  (halfwidth  version of ¡À) in The States appears as a yen symbol
   (the half-width version of ¡ï). How it will appear  on  your  system  is  a
   function  of  what font you use and what output encoding method you choose,
   which may be different from the font and method that was used to print this
   manual  (both  of  which  may  be  different  from  what’s  printed on your
   keyboard’s appropriate key).  Make sure to keep this in mind while reading.

STARTUP

   Let’s  assume  that your copy of edict is in ~/lib/edict. You can start the
   program simply with

           lookup ~/lib/edict

   You’ll note that lookup spends some  time  building  an  index  before  the
   default¡Èlookup> ¡Éprompt appears.

   Lookup  gains  much  of  its  search  speed by constructing an index of the
   file(s) to be searched. Since building the  index  can  be  time  consuming
   itself,  you  can  have  lookup write the built index to a file that can be
   quickly loaded the next time you run the  program.   Index  files  will  be
   given a¡È.jin¡É(Jeffrey’s Index) ending.

   Let’s build the indices for edict and kanjidic now:

           lookup -write ~/lib/edict ~/lib/kanjidic

   This will create the index files
          ~/lib/edict.jin
          ~/lib/kanjidic.jin
   and exit.

   You  can  now  re-start lookup , automatically using the pre-computed index
   files as:

          lookup ~/lib/edict ~/lib/kanjidic

   You should then be presented with the prompt without having to wait for the
   index  to  be constructed (but see the section on Operating System concerns
   for possible reasons of delay).

INPUT

   There are basically two types of input: searches and commands.  Commands do
   such things as tell lookup to load more files or set flags. Searches report
   lines of a file that match some search specifier (where lines to search for
   are specified by one or more regular expressions).

   The input syntax may perhaps at first seem odd, but has been designed to be
   powerful and concise. A bit of time invested to learn it well will pay  off
   greatly when you need it.

BRIEF EXAMPLE

   Assuming  you’ve  started  lookup  with  edict and kanjidic as noted above,
   let’s try a few searches. In these examples, the
       ¡Èsearch [edict]> ¡É
   is the prompt.  Note that the space after the¡Æ>¡Çis part of the prompt.

   Given the input:

     search [edict]> tranquil

   lookup will report all lines with the string¡Ètranquil¡Éin them. There  are
   currently about a dozen such lines, two of which look like:

     °Â¤é¤« [¤ä¤¹¤é¤«] /peaceful (an)/tranquil/calm/restful/
     °Â¤é¤® [¤ä¤¹¤é¤®] /peace/tranquility/

   Notice   that   lines  with¡Ètranquil¡Éand¡Ètranquility¡Ématched?  This  is
   because¡Ètranquil¡Éwas embedded  in  the  word¡Ètranquility¡É.   You  could
   restrict   the   search  to  only  the  word¡Ètranquil¡Éby  prepending  the
   special¡Èstart  of  word¡Ésymbol¡Æ<¡Çand  appending  the  special¡Èend   of
   word¡Ésymbol¡Æ>¡Çto the regex, as in:

     search [edict]> <tranquil>

   This is the regular expression that says¡Èthe beginning of a word, followed
   by a¡Æt¡Ç,¡Ær¡Ç, ...,¡Æl¡Ç, which is at the end  of  a  word.¡ÉThe  current
   version of edict has just three matching entries.

   Let’s try another:

     search [edict]> fukushima

   This  is a search for the¡ÈEnglish¡Éfukushima -- ways to search for kana or
   kanji will be explored later.  Note that among the several  lines  selected
   and printed are:

     ÉûÅç [¤Õ¤¯¤·¤Þ] /Fukushima (pn,pl)/
     ÌÚÁ¾Ê¡Åç [¤­¤½¤Õ¤¯¤·¤Þ] /Kisofukushima (pl)/

   By    default,   searches   are   done   in   a   case-insensitive   manner
   --¡ÆF¡Çand¡Æf¡Çare treated the same by lookup,  at  least  so  far  as  the
   matching goes.  This is called case folding.

   Let’s  give a command to turn this option off, so that¡Æf¡Çand¡ÆF¡Çwon’t be
   considered the same.  Here’s an odd point about lookups input syntax:  the
   default  setting  is  that  all command lines must begin with a space.  The
   space is the (default) command-introduction character and tells  the  input
   parser  to expect a command rather than a search regular expression.  It is
   a common mistake at first to  forget  the  leading  space  when  issuing  a
   command.  Be careful.

   Try  the  command¡È fold¡Éto  report  the  current  status of case-folding.
   Notice that as soon as you type the space, the prompt changes to
     ¡Èlookup command> ¡É
   as a reminder that now  you’re  typing  a  command  rather  than  a  search
   specification.

     lookup command>  fold

   The reply should be¡Èfile #0’s case folding is on¡É

   You  can  actually  turn  it  off  with¡È  fold  off¡É.  Now try the search
   for¡Èfukushima¡Éagain.    Notice    that    this    time    the     entries
   with¡ÈFukushima¡Éaren’t  listed?  Now try the search string¡ÈFukushima¡Éand
   see that the entries with¡Èfukushima¡Éaren’t listed.

   Case folding is  usually  very  convenient  (it  also  makes  corresponding
   katakana and hiragana match the same), so don’t forget to turn it back on:

     lookup command>  fold on

JAPANESE INPUT

   Lookup  has  an  automatic  romaji¢ªkana converter. A leading¡Æ/¡Çindicates
   that romaji is to follow. Try typing¡È/tokyo¡Éand  you’ll  see  it  convert
   to¡È/¤È¤­¤ç¡Éas  you  type. When you hit return, lookup will list all lines
   that have a¡È¤È¤­¤ç¡Ésomewhere in them. Well, sort of.  Look  carefully  at
   the  lines which match. Among them (if you had case folding back on) you’ll
   see:

     ¥­¥ê¥¹¥È¶µ [¥­¥ê¥¹¥È¤­¤ç¤¦] /Christianity/
     Åìµþ [¤È¤¦¤­¤ç¤¦] /Toukyou (pl)/Tokyo/current capital of Japan/
     Æ̶À [¤È¤Ã¤­¤ç¤¦] /convex lens/

   The   first   one   has¡È¤È¤­¤ç¡Éin    it    (as¡È¥È¤­¤ç¡É,    where    the
   katakana¡È¥È¡Ématches  in a case-insensitive manner to the hiragana¡È¤È¡É),
   but  you  might  consider  the  others   unexpected,   since   they   don’t
   have¡È¤È¤­¤ç¡Éin  them.   They’re  close (¡È¤È¤¦¤­¤ç¡Éand¡È¤È¤Ã¤­¤ç¡É), but
   not exact.  This  is  the  result  of  lookup’s¡Èfuzzification¡É.  Try  the
   command¡È fuzz¡É(again,   don’t  forget  the  command-introduction  space).
   You’ll see that fuzzification  is  turned  on.   Turn  it  off  with¡È fuzz
   off¡Éand  try¡È/tokyo¡É(which  will  convert as you type) again.  This time
   you only get the lines which have¡È¤È¤­¤ç¡Éexactly (well, case  folding  is
   still on, so it might match katakana as well).

   In  a  fuzzy  search, length of vowels is ignored --¡È¤È¡Éis considered the
   same  as¡È¤È¤¦¡É,  for  example.  Also,  the   presence   or   absence   of
   any¡È¤Ã¡Écharacter is ignored, and the pairs ¤¸ ¤Â, ¤º ¤Å, ¤¨ ¤ñ, and ¤ª ¤ò
   are considered identical in a fuzzy search.

   It might be convenient to consider a fuzzy search  to  be  a¡Èpronunciation
   search¡É.    Special note: fuzzification will not be performed if a regular
   expression¡È*¡É,¡È+¡É,or¡È?¡Émodifies a non-ASCII character. This is not an
   issue  when  input  patterns are filename-like wildcard patterns (discussed
   below).

   In addition to kana fuzziness, there’s one  special  case  for  kanji  when
   fuzziness  is  on.  The  kanji  repeater  mark¡È¡¹¡Éwill be recognized such
   that¡È»þ¡¹¡Éand¡È»þ»þ¡Éwill match each-other.

   Turn fuzzification back on (¡Èfuzz on¡É), and search for  all  whole  words
   which sound like¡Ètokyo¡É. That search would be specified as:

     search [edict]> /<tokyo>

   (again, the¡Ètokyo¡Éwill be converted to¡È¤È¤­¤ç¡Éas you type).  My copy of
   edict has the three lines

     Åìµþ [¤È¤¦¤­¤ç¤¦] /Toukyou (pl)/Tokyo/current capital of Japan/
     Æõö [¤È¤Ã¤­¤ç] /special permission/patent/
     Æ̶À [¤È¤Ã¤­¤ç¤¦] /convex lens/

   This kind of whole-word romaji-to-kana  search  is  so  common,  there’s  a
   special  short cut. Instead of typing¡È/<tokyo>¡É, you can type¡È[tokyo]¡É.
   The leading¡Æ[¡Çmeans¡Èstart romaji¡Éand¡Èstart of  word¡É.   Were  you  to
   type¡È<tokyo>¡Éinstead (without a leading¡Æ/¡Çor¡Æ[¡Çto indicate romaji-to-
   kana  conversion),  you  would  get  all  lines  with  the  English  whole-
   word¡Ètokyo¡Éin  them.  That would be a reasonable request as well, but not
   what we want at the moment.

   Besides the kana conversion,  you  can  use  any  cut-and-paste  that  your
   windowing  system  might provide to get Japanese text onto the search line.
   Cut¡È¤È¤­¤ç¡Éfrom somewhere and paste onto the search  line.  When  hitting
   enter   to   run  the  search,  you’ll  notice  that  it  is  done  without
   fuzzification (even if the fuzzification flag was¡Èon¡É).   That’s  because
   there’s  no leading¡Æ/¡Ç. Not only does a leading¡Æ/¡Çndicate that you want
   the romaji-to-kana conversion, but that you want it done fuzzily.

   So, if you’d  like  fuzzy  cut-and-paste,  just  type  a  leading¡Æ/¡Çefore
   pasting (or go back and prepend one after pasting).

   These  examples  have all been pretty simple, but you can use all the power
   that regexes have to  offer.  As  a  slightly  more  complex  example,  the
   search¡È<gr[ea]y>¡Éwould     look     for     all     lines     with    the
   words¡Ègrey¡Éor¡Ègray¡Éin them.  Since the¡Æ[¡Çisn’t the first character of
   the  line, it doesn’t mean what was mentioned above (start-of-word romaji).
   In this case, it’s just the regular-expression¡Èclass¡Éindicator.

   If you feel more comfortable using filename-like¡È*.txt¡Éwildcard patterns,
   you  can  use the¡Èwildcard on¡Écommand to have patterns be considered this
   way.

   This has been a quick introduction to the basics of lookup.

   It can be very  powerful  and  much  more  complex.  Below  is  a  detailed
   description of its various parts and features.

READLINE INPUT

   The  actual  keystrokes  are  read by a readline-ish package that is pretty
   standard. In addition to just typing away,  the  following  keystrokes  are
   available:

     ^B  / ^F     move left/right one character on the line
     ^A  / ^E     move to the start/end of the line
     ^H  / ^G     delete one character to the left/right of the cursor
     ^U  / ^K     delete all characters to the left/right of the cursor
     ^P  / ^N     previous/next lines on the history list
     ^L or ^R     redraw the line
     ^D           delete char under the cursor, or EOF if line is empty
     ^space       force romaji conversion (^@ on some systems)

   If  automatic romaji-to-kana conversion is turned on (as it is by default),
   there are certain situations where the conversion will be done, as  we  saw
   above.  Lower-case  romaji  will be converted to hiragana, while upper-case
   romaji to katakana.  This usually won’t matter,  though,  as  case  folding
   will treat hiragana and katakana the same in the searches.

   In  exactly  what  situations  the  automatic  conversion  will  be done is
   intended to be rather intuitive once the basic idea is  learned.   However,
   at  any time, one can use control-space to convert the ASCII to the left of
   the cursor to kana. This can be particularly useful when needing  to  enter
   kana on a command line (where auto conversion is never done; see below)

ROMAJI FLAVOR

   Most  flavors  of  romaji  are recognized. Special or non-obvious items are
   mentioned  below.  Lowercase  are  converted  to  hiragana,  uppercase   to
   katakana.

   Long vowels can be entered by repeating the vowel, or with¡Æ-¡Çor¡Æ^¡Ç.

   In situations where an¡Èn¡Écould be vague, as in¡Èna¡Ébeing ¤Ê or ¤ó¤¢, use
   a   single    quote    to    force    ¤ó.     Therefore,¡Ökenichi¡×¢ª¤±¤Ë¤Á
   while¡Öken’ichi¡×¢ª¤±¤ó¤¤¤Á.

   The  romaji  has  been  richly extended with many non-standard combinations
   such   as   ¤Õ¤¡   or   ¤Á¤§,   which   are   represented   in    intuitive
   ways:¡Öfa¡×¢ª¤Õ¤¡,¡Öche¡×¢ª¤Á¤§. etc.

   Various other mappings of interest:

     wo ¢ª¤ò     we¢ª¤ñ      wi¢ª¤ð
     VA ¢ª¥ô¥¡   VI¢ª¥ô¥£    VU¢ª¥ô      VE¢ª¥ô¥§    VO¢ª¥ô¥©
     di ¢ª¤Â     dzi¢ª¤Â     dya¢ª¤Â¤ã   dyu¢ª¤Â¤å   dyo¢ª¤Â¤ç
     du ¢ª¤Å     tzu¢ª¤Å     dzu¢ª¤Å

   (the following kana are all smaller versions of the regular kana)

     xa ¢ª¤¡     xi¢ª¤£      xu¢ª¤¥      xe¢ª¤§      xo¢ª¤©
     xu ¢ª¤¥     xtu¢ª¤Ã     xwa¢ª¤î     xka¢ª¥õ     xke¢ª¥ö
     xya¢ª¤ã     xyu¢ª¤å     xyo¢ª¤ç

INPUT SYNTAX

   Any input line beginning with a space (or whichever character is set as the
   command-introduction character) is processed as a command to lookup  rather
   than a search spec.  Automatic kana conversion is never done on these lines
   (but forced conversion with control-space may be done at any time).

   Other lines are taken as search regular  expressions,  with  the  following
   special cases:

   ?  A  line  consisting  of  a  single question mark will report the current
      command-introduction character (the default  is  a  space,  but  can  be
      changed with the¡Ècmdchar¡Écommand).

   =  If  a  line  begins with¡Æ=¡Ç, the line (without the¡Æ=¡Ç) is taken as a
      search regular expression, and no automatic (or internal --  see  below)
      kana conversion is done anywhere on the line (although again, conversion
      can always be forced with control-space).  This can be used to  initiate
      a  search  where  the beginning of the regex is the command-introduction
      character, or in certain situations where automatic kana  conversion  is
      temporarily not desired.

   /  A line beginning with¡Æ/¡Çindicates romaji input for the whole line.  If
      automatic kana conversion is turned on, the conversion will be  done  in
      real-time,  as the romaji is typed. Otherwise it will be done internally
      once  the  line  is  entered.    Regardless,   the   presence   of   the
      leading¡Æ/¡Çindicates  that any kana (either converted or cut-and-pasted
      in) should be¡Èfuzzified¡Éif fuzzification is turned on.

      As an addition to the above, if the line doesn’t begin  with¡Æ=¡Çor  the
      command-introduction  character  (and  automatic  conversion  is  turned
      on),¡Æ/¡Ç anywhere on the line initiates automatic  conversion  for  the
      following word.

   [  A  line  beginning  with¡Æ[¡Çis  taken  to  be  romaji  (just  as a line
      beginning  with¡Æ/¡Ç,  and  the   converted   romaji   is   subject   to
      fuzzification (if turned on).  However, if¡Æ[¡Çis used rather than¡Æ/¡Ç,
      an implied¡Æ<¡Ç¡Èbeginning of word¡Éis prepended to the  resulting  kana
      regex.   Also, any ending¡Æ]¡Çon such a line is converted to the¡Èending
      of word¡Éspecifier¡Æ>¡Çin the resulting regex.

   In addition to the above, lines may have certain prefixes and  suffixes  to
   control aspects of the search or command:

   !  Various  flags can be toggled for the duration of a particular search by
      prepending a¡È!!¡Ésequence to the input line.

      Sequences are shown below, along with commands related to each:

       !F! ¡Ä  Filtration is toggled for this line (filter)
       !M! ¡Ä  Modification is toggled for this line (modify)
       !w! ¡Ä  Word-preference mode is toggled for this line (word)
       !c! ¡Ä  Case folding is toggled for this line (fold)
       !f! ¡Ä  Fuzzification is toggled for this line (fuzz)
       !W! ¡Ä  Wildcard-pattern mode is toggled for this line (wildcard)
       !r! ¡Ä  Raw. Force fuzzification off for this line
       !h! ¡Ä  Highlighting is toggled for this line (highlight)
       !t! ¡Ä  Tagging is toggled for this line (tag)
       !d! ¡Ä  Displaying is on for this line (display)

      The letters can be combined, as in¡È!cf!¡É.

      The final¡Æ!¡Ç can be omitted if the first character after the  sequence
      is not an ASCII letter.

      If no letters are given (¡È!!¡É).¡È!f!¡Éis the default.

      These  last  two  points can be conveniently combined in the common case
      of¡È!/romaji¡Éwhich would be the same as¡È!f!/romaji¡É.

      The special sequence¡È!?¡Élists the above, as well  as  indicates  which
      are currently turned on.

      Note  that  the  letters  accepted  in  a¡È!!¡Ésequence  are many of the
      indicators shown by the¡Èfiles¡Écommand.

   +  A¡Æ+¡Çprepended to anything above will cause the final search  regex  to
      be   printed.  This  can  be  useful  to  see  when  and  what  kind  of
      fuzzification and/or internal kana conversion is happening. Consider:

        search [edict]> +/¤ï¤«¤ë
        a match is¡È¤ï[¤¡¤¢¡¼]*¤Ã?¤«[¤¡¤¢¡¼]*¤ë[¤¥¤¦¤ª¤©¡¼]*¡É

      Due to  the¡Èleading¡É/  the  kana  is  fuzzified,  which  explains  the
      somewhat complex resulting regex. For comparison, note:

        search [edict]> +¤ï¤«¤ë
        a match is¡È¤ï¤«¤ë¡É
        search [edict]> +!/¤ï¤«¤ë
        a match is¡È¤ï¤«¤ë¡É

      As  the¡Æ+¡Çshows,  these  are  not  fuzzified.  The  first  one  has no
      leading¡Æ/¡Çor¡Æ[¡Çto  induce  fuzzification,  while  the   second   has
      the¡Æ!¡Çline  prefix  (which  is  the  default version of¡È!f!¡É), which
      toggles fuzzification mode to¡Èoff¡Éfor that line.

   ,  The default of all searches and most commands is to work with the  first
      file  loaded (edict in these examples). One can change this default (see
      the¡Èselect¡Écommand) or, by appending a comma+digit sequence at the end
      of an input line, force that line to work with another previously-loaded
      file. An appended¡È,1¡Éworks with first  extra  file  loaded  (in  these
      examples,  kanjidic).   An  appended¡È,2¡Éworks  with the 2nd extra file
      loaded, etc.

      An appended¡È,0¡Éworks with the original first file (and can  be  useful
      if the default file has been changed via the¡Èselect¡Écommand).

      The following sequence shows a common usage:

        search [edict]> [¤È¤­¤ç¤È]
        ÅìµþÅÔ [¤È¤¦¤­¤ç¤¦¤È] /Tokyo Metropolitan area/

      cutting  and  pasting  the  ÅÔ  from  above, and adding a¡È,1¡Éto search
      kanjidic:

        search [edict]> ÅÔ,1
        ÅÔ 4554 N4769 S11  ..... ¥È ¥Ä ¤ß¤ä¤³ {metropolis} {capital}

FILENAME-LIKE WILDCARD MATCHING

   When  wildcard-pattern  mode  is  selected,  patterns  are  considered   as
   extended.Q  "*.txt"  "-like"  patterns.  This  is often more convenient for
   users not familiar with regular expressions. To have this mode selected  by
   default, put

      default wildcard on

   into your¡È.lookup¡Éfile (see¡ÈSTARTUP FILE¡Ébelow).

   When  wildcard  mode  is  on, only ¡È*¡É,¡È?¡É,¡È+¡É,and¡È.¡É,are effected.
   See the entry for the ¡Èwildcard¡Écommand below for details.

   Other features, such as the multiple-pattern searches (described below) and
   other regular-expression metacharacters are available.

MULTIPLE-PATTERN SEARCHES

   You  can  put  multiple patterns in a single search specifier.  For example
   consider

     search [edict]> china||japan

   The first part (¡Èchina¡É) will select all lines that have¡Èchina¡Éin them.
   Then,  from  among  those  lines,  the  second  part will select lines that
   have¡Èjapan¡Éin them.  The¡È||¡Éis  not  part  of  any  pattern  --  it  is
   lookup’s¡Èpipe¡Émechanism.

   The   above   example   is   very   different   from   the  single  pattern
   ¡Èchina|japan¡Éwhich     would     select     any     line     that     had
   either¡Èchina¡Éor¡Èjapan¡É.    With¡Èchina||japan¡É,  you  get  lines  that
   have¡Èchina¡Éand then also have¡Èjapan¡Éas well.

   Note    that    it    is    also     different     from     the     regular
   expression¡Èchina.*japan¡É(or   the  wildcard  pattern¡Èchina*japan¡É)which
   would select lines having¡Èchina, then maybe some stuff, then japan¡É.  But
   consider  the case when¡Èjapan¡Écomes on the line before¡Èchina¡É. Just for
   your comparison, the  multiple-pattern  specifier¡Èchina||japan¡Éis  pretty
   much        the        same        as        the       single       regular
   expression¡Èchina.*japan|japan.*china¡É.

   If you use¡È|!|¡Éinstead of¡È||¡É, it  will  mean¡È...and  then  lines  not
   matching...¡É.

   Consider a way to find all lines of kanjidic that do have a Halpern number,
   but don’t have a Nelson number:

       search [edict]> <H\d+>|!|<N\d+>

   If you then  wanted  to  restrict  the  listing  to  those  that  also  had
   a¡Èjinmeiyou¡Émarking  (kanjidic’s¡ÈG9¡Éfield)  and  had a reading of ¤¢¤­,
   you could make it:

       search [edict]> <H\d+>|!|<N\d+>||<G9>||<¤¢¤­>

   A prepended¡Æ+¡Çwould explain:

       a match is¡È<H\d+>¡É
       and not¡È<N\d+>¡É
       and¡È<G9>¡É
       and¡È<¤¢¤­>¡É

   The¡È|!|¡Éand¡È||¡Écan  be  used  to  make  up  to  ten  separate   regular
   expressions in any one search specification.

   Again,  it  is important to stress that¡È||¡Édoes not mean¡Èor¡É(as it does
   in a C program, or as¡Æ|¡Çdoes within a  regular  expression).   You  might
   find it convenient to read¡È||¡Éas¡Èand also¡É, while reading¡È|!|¡Éas¡Èbut
   not¡É.

   It   is   also   important   to   stress   that   any   whitespace   around
   the¡È||¡Éand¡È|!|¡Éconstruct  is not ignored, but kept as part of the regex
   on either side.

COMBINATION SLOTS

   Each file, when  loaded,  is  assigned  to  a¡Èslot¡Évia  which  subsequent
   references  to the file are then made.  The slot may then be searched, have
   filters and flags set, etc.

   A  special  kind  of  slot,  called   a¡Ècombination   slot¡É,rather   than
   representing a single file, can represent multiple previously-loaded slots.
   Searches against a combination slot (or¡Ècombo slot¡Éfor short) search  all
   those   previously-loaded   slots  associated  with  it  (called¡Ècomponent
   slots¡É).  Combo slots are set up with the combine command.

   A Combo slot has no filter or modify spec, but can have a local prompt  and
   flags  just  like  normal  file  slots.   The  flags, however, have special
   meanings with combo slots. Most combo-slot flags act as a mask against  the
   component-slot  flags;  when  acted  upon  as  a  member  of  the  combo, a
   component-slot’s flag will be disabled if  the  corresponding  combo-slot’s
   flag is disabled.

   Exceptions to this are the autokana, fuzz, and tag flags.

   The  autokana  and  fuzz  flags  governs a combo slot exactly the same as a
   regular file slot.  When a slot is searched as a component of a combination
   slot,  the component slot’s fuzz (and autokana) flags, or lack thereof, are
   ignored.

   The tag flag is  quite  different  altogether;  see  the  tag  command  for
   complete information.

   Consider the following output from the files command:

     ¨®¨¬¨³¨¬¨¬¨¬¨¬¨¸¨¬¨¬¨³¨¬¨¬¨¬¨³¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬
     ¨­ 0¨­F wcfh d¨¢a I ¨­ 2762k¨­/usr/jfriedl/lib/edict
     ¨­ 1¨­FM cf  d¨¢a I ¨­  705k¨­/usr/jfriedl/lib/kanjidic
     ¨­ 2¨­F  cfh@d¨¢a   ¨­    1k¨­/usr/jfriedl/lib/local.words
     ¨­*3¨­FM cfhtd¨¢a   ¨­ combo¨­kotoba (#2, #0)
     ¨±¨¬¨µ¨¬¨¬¨¬¨¬¨º¨¬¨¬¨µ¨¬¨¬¨¬¨µ¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬

   See  the discussion of the files command below for basic explanation of the
   output.

   As can be seen, slot #3 is a combination slot with  the  name¡Èkotoba¡Éwith
   component  slots  two  and  zero.  When a search is initiated on this slot,
   first slot #2¡Èlocal.words¡Éwill be  searched,  then  slot  #0¡Èedict¡É.
   Because  the  combo  slot’s  filter flag is on, the component slots’ filter
   flag will remain on during the search.  The combo slot’s word flag is  off,
   however, so slot #0’s word flag will be forced off during the search.

   See the combine command for information about creating combo slots.

PAGER

   Lookup  has a built in pager (a’la more).  Upon filling a screen with text,
   the string
       --MORE [space,return,c,q]--
   is shown. A space will allow another screen of text; a  return  will  allow
   one  more line. A¡Æc¡Ç will allow output text to continue unpaged until the
   next command. A¡Æq¡Ç will flush output of the current command.

   If supported by the OS, lookups idea of the screen size  is  automatically
   set  upon  startup  and  window  resize.  Lookup must know the width of the
   screen in doing both the horizontal input-line scrolling, and  for  knowing
   when a long line wraps on the screen.

   The pager parameters can be set manually with the¡Èpager¡Écommand.

COMMANDS

   Any  line intended to be a command must begin with the command-introduction
   character   (the   default   is   a   space,   but   can   be    set    via
   the¡Ècmdchar¡Écommand).  However, that character is not part of the command
   itself and won’t be shown in the following list of commands.

   There are a number of commands that work with the selected file or selected
   slot (both meaning the same thing).  The selected file is the one indicated
   by an appended comma+digit, as mentioned above. If no  such  indication  is
   given,  the  default  selected file is used (usually the first file loaded,
   but can be changed with the¡Èselect¡Écommand).

   Some commands accept a boolean argument, such as to turn a flag on or  off.
   In  all  such  cases,  a¡È1¡Éor¡Èon¡Émeans  to  turn  the  flag  on,  while
   a¡È0¡Éor¡Èoff¡Éis  used  to  turn  it  off.   Some   flags   are   per-file
   (¡Èfuzz¡É,¡Èfold¡É,  etc.),  and a command to set such a flag normally sets
   the flag for the selected file only. However, the default  value  inherited
   by  subsequently  loaded  files  can  be set by prepending¡Èdefault¡Éto the
   command. This is particularly useful in the startup file before  any  files
   are loaded (see the section STARTUP FILE).

   Items separated by¡Æ|¡Çare mutually exclusive possibilities (i.e. a boolean
   argument is¡È1|on|0|off¡É).

   Items shown in brackets (¡Æ[¡Çand¡Æ]¡Ç) are  optional.  All  commands  that
   accept a boolean argument to set a flag or mode do so optionally -- with no
   argument the command will report the current status of the mode or flag.

   Any command that allows an argument in quotes (such as load,  etc.)   allow
   the use of single or double quotes.

   The commands:

   [default] autokana [boolean]
      Automatic  romaji  ¢ª kana conversion for the selected file is turned on
      or off (default is on).  However, if¡Èdefault¡Éis specified,  the  value
      to  be  inherited as the default by subsequently-loaded files is set (or
      reported).

      Can be temporarily disabled by  a  prepended¡Æ=¡Ç,as  described  in  the
      INPUT SYNTAX section.

   clear|cls
      Attempts  to clear the screen. If you’re using a kterm it’ll just output
      the appropriate  tty  control  sequence.  Otherwise  it’ll  try  to  run
      the¡Èclear¡Écommand.

   cmdchar [’one-byte-char’]
      The  default  command-introduction  character  is a space, but it may be
      changed via this command. The single quotes  surrounding  the  character
      are required. If no argument is given, the current value is printed.

      An  input  line consisting of a single question mark will also print the
      current value (useful for when you don’t know the current value).

      Woe to the one that sets the command-introduction character  to  one  of
      the other special input-line characters, such as¡Æ+¡Ç,¡Æ/¡Ç, etc.

   combine ["name"] [ num += ] slotnum ...
      Creates  or  adds  file slots to a combination slot (see the COMBINATION
      SLOTS section for general information).  Note that¡Ècombo¡Émay  be  used
      as the command as well.

      Assuming  for  this  example  that  slots  0-2 are loaded with the files
      curly, moe, and larry, we  can  create  a  combination  slot  that  will
      reference all three:

        combo "three stooges" 2, 0, 1

      The command will report

        creating combo slot #3 (three stooges): 2 0 1

      The  name is optional, and will appear in the files list, and also maybe
      be used to specify the slot as an argument to the select command.

      A search via the newly created combo slot  would  search  in  the  order
      specified  on  the  combo  command  line:  first  larry, then curly, and
      finally moe.

      If you later load another file (say, jeffrey to slot #4), you  can  then
      add it to the previously made combo:

        combo 3 += 4

      (the¡È+=¡Éwording  comes  from  the  C  programming  language  where  it
      means¡Èadd on to¡É).  Adding to a combination always adds slots  to  the
      end of the list.

      You can take the opportunity of adding the slot to also change the name,
      if you like:

        combo "four stooges" 3 += 4

      The reply would be
        adding to combo slot #3(four stooges): 4

      A file slot can be a component of any particular combo slot  only  once.
      When reporting the created or added slot numbers, the number will appear
      in parenthesis if it had already been a member of the list.

      Furthermore, only file slots can be component members  of  combo  slots.
      Attempting to combine combo slot X to combo slot Y will result in having
      X’s component file slots (rater than the combo slot itself) added to  Y.

   command debug [boolean]
      Sets  the  internal  command parser debugging flag on or off (default is
      off).

   debug [boolean]
      Sets the internal general-debugging flag on or off (default is off).

   describe specifier
      This command will tell you how a  character  (or  each  character  in  a
      string) is encoded in the various encoding methods:

          lookup command>  describe "µ¤"
          ¡Èµ¤¡Éas  EUC  is 0xb5a4 (181 164; 265 \244)
                as  JIS  is 0x3524 ( 53  36;  65 \044 "5$")
                as KUTEN is   2104 ( 0x1504;  25 \004)
                as S-JIS is 0x8b1f (139  31; 213 \037)

      The quotes surrounding the character or string to describe are optional.
      You can also give a regular ASCII character and  have  the  double-width
      version  of  the  character  described.... indicating¡ÈA¡É, for example,
      would describe¡È£Á¡É.   Specifier can also be a four-digit kuten  value,
      in which case the character with that kuten will be described.

      If  a  four-digit  specifier has a hex digit in it, or if it is preceded
      by¡È0x¡É, the value is taken as a JIS code. You can  precede  the  value
      by¡Èjis¡É,¡Èsjis¡É,¡Èeuc¡É,  or¡Èkuten¡Éto  force  interpretation to the
      requested code.

      Finally, specifier can be a string of stripped JIS (JIS w/o the kanji-in
      and kanji-out codes, or with the codes but without the escape characters
      in them).  For example¡ÈF|K\¡Éwould describe the two characters  Æü  and
      ËÜ.

   encoding [euc|sjis|jis]
      The  same  as  the  -euc, -jis, and -sjis command-line options, sets the
      encoding method for interactive input and output (or reports the current
      status).   More detail over the output encoding can be achieved with the
      output encoding command. A separate encoding for input can be  set  with
      the input encoding command.

   files [ - | long ]
      Lists  what  files are loaded in what slots, and some status information
      about them, as with:

      ¨­*0¨­F wcfh d¨¢a I ¨­ 3749k¨­/usr/jeff/lib/edict
      ¨­ 1¨­FM cf  d¨¢a I ¨­  754k¨­/usr/jeff/lib/kanjidic

        ¨®¨¬¨³¨¬¨¬¨¬¨¬¨¬¨¸¨¬¨¬¨³¨¬¨¬¨¬¨³¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬
        ¨­ 0¨­F wcf h d ¨¢a I ¨­ 2762k¨­/usr/jfriedl/lib/edict
        ¨­ 1¨­FM cf   d ¨¢a I ¨­  705k¨­/usr/jfriedl/lib/kanjidic
        ¨­ 2¨­F  cfWh@d ¨¢a   ¨­    1k¨­/usr/jfriedl/lib/local.words
        ¨­*3¨­FM cf htd ¨¢a   ¨­ combo¨­kotoba (#2, #0)
        ¨­ 4¨­   cf   d ¨¢a   ¨­  205k¨­/usr/dict/words
        ¨±¨¬¨µ¨¬¨¬¨¬¨¬¨¬¨º¨¬¨¬¨µ¨¬¨¬¨¬¨µ¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬

      The first section is the slot number, with a¡È*¡Ébeside the default slot
      (as set by the select command).

      The second section shows per-slot flags and status. Letters are shown if
      the flag is on, omitted if off. In the list below, related commands  are
      given for each item:

        F ¡Ä if there is a filter {but ’#’ if disabled}. (filter)
        M ¡Ä if there is a modify spec {but ’%’ if disabled}. (modify)
        w ¡Ä if word-preference mode is turned on. (word)
        c ¡Ä if case folding is turned on. (fold)
        f ¡Ä if fuzzification is turned on. (fuzz)
        W ¡Ä if wildcard-pattern mode is turned on (wildcard)
        h ¡Ä if highlighting is turned on. (highlight)
        t ¡Ä if there is a tag {but @ if disabled} (tag)
        d ¡Ä if found lines should be displayed (display)
        ¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡
        a ¡Ä if autokana is turned on (autokana)
        P ¡Ä if there is a file-specific local prompt (prompt)
        I ¡Ä if the file is loaded with a precomputed index (load)
        d ¡Ä if the display flag is on (display)
      Note  that  the  letters  in  the  upper  section directly correspond to
      the¡È!!¡Ésequence characters described in the INPUT SYNTAX section.

      If there is a digit at the end of the flag section,  it  indicates  that
      only  #/10 of the file is actually loaded into memory (as opposed to the
      file having been completely loaded). Unloaded files will be loaded while
      lookup is idle, or when first used.

      If  the slot is a combination slot (as slot #3 is in the example above),
      that is noted in  the  third  section,  and  the  combination  name  and
      component  slot  numbers  are noted in the fourth. Also, for combination
      slots (which have no filter or modify specifications, only the flags), F
      and/or  M are shown if the corresponding mode is allowed during searches
      via the combo slot. See the tag command for info about t with respect to
      combination slots.

      If an argument (either¡È-¡Éor¡Èlong¡Éwill work) is given to the command,
      a short message about what the flags mean is also printed.

   filter ["label"] [!] /regex/[i]
      Sets the filter for the selected slot (which must contain a file and not
      a  combination).   If  a  filter  is set and active for a file, any line
      matching the given regex is filtered from the output (if the¡Æ!¡Çis  put
      before  the  regex,  any  line not matching the regex is filtered).  The
      label , which isn’t required, merely acts as  documentation  in  various
      diagnostics.

      As  an  example,  consider that edict lines often have¡È(pn)¡Éon them to
      indicate that the given English is a place name. Often these place names
      can  be  a  bother,  so  it  would be nice to elide them from the output
      unless specifically requested.  Consider the example:

        lookup command>  filter "name" /(pn)/
        search [edict]> [¤­¤Î]
        µ¡Ç½ [¤­¤Î¤¦] /function/faculty/
        µ¢Ç¼ [¤­¤Î¤¦] /inductive/
        ºòÆü [¤­¤Î¤¦] /yesterday/
        ¢ã3 "name" lines filtered¢ä

      In the example,¡Æ/¡Çcharacters are used to delimit the start and stop of
      the  regex (as is common with many programs). However, any character can
      be used. A final¡Æi¡Ç, if present, indicates that the  regex  should  be
      applied in a case-insensitive manner.

      The  filter, once set, can be enabled or disabled with the other form of
      the¡Èfilter¡Écommand (described  below).  It  can  also  be  temporarily
      turned  off  (or,  if disabled, temporarily turned on) by the¡È!F!¡Éline
      prefix.

      Filtered lines can optionally be saved and  then  displayed  if  you  so
      desire.  See the¡Èsaved list size¡Éand¡Èshow¡Écommands.

      Note  that  if  you  have  saving  enabled  and  only  one line would be
      filtered, it is simply printed at the end (rather than print a one  line
      message about how one line was filtered).

      By the way, a better¡Èname¡Éfilter for edict would be:

        filter "name" #^[^/]+/[^/]*<p[ln]>[^/]*/$#

      as  it  would filter all entries that had only one English section, that
      section being a name.  It is also an example of  using  something  other
      than¡Æ/¡Çto delimit a regex, as it makes things a bit easier to read.

   filter [boolean]
      Enables or disables the filter for the selected slot.  If no argument is
      given, displays the current filter and status.

   [default] fold [boolean]
      The selected slot’s case folding is turned on or off (default is on), or
      reported  if no argument given.  However, if¡Èdefault¡Éis specified, the
      value to be inherited as the default by subsequently-loaded files is set
      (or reported).

      Can be temporarily toggled by the¡È!c!¡Éline prefix.

   [default] fuzz [boolean]
      The  selected  slot’s fuzzification is turned on or off (default is on),
      or reported if no argument given.  However,  if¡Èdefault¡Éis  specified,
      the value to be inherited as the default by subsequently-loaded files is
      set (or reported).

      Can be temporarily toggled by the¡È!f!¡Éline prefix.

   help [regex]
      Without an argument gives a short help list.  With  an  argument,  lists
      only commands whose help string is picked up by the given regex.

   [default] highlight [boolean]
      Sets  matched-string  highlighting  on  or  off  for  the  selected slot
      (default off), or reports the current status if no  argument  is  given.
      However,  if¡Èdefault¡Éis  specified,  the  value to be inherited as the
      default by subsequently-loaded files is set (or reported).

      If on, shows in bold or reverse video (see below) that part of the  line
      which  was matched by the search regex.  If multiple regexes were given,
      that part matched by the first regex is show.

      Note that a regex might match a portion of a line which is later removed
      by a modify parameter. In this case, no highlighting is done.

      Can be temporarily toggled by the¡È!h!¡Éline prefix.

   highlight style [bold | inverse | standout | <___>]
      Sets  the  style of highlighting for when highlighting is done.  Inverse
      (inverse video) and standout are the same. The default is bold.  You can
      also  give  an  HTML  tag, such as¡È<BOLD>¡Éand items will be wrapped by
      <BOLD>...</BOLD>. This would be particularly useful when the  output  is
      going to a CGI, as when lookup has been built in a server configuration.

      Note that the highlighting is affected by using raw VT100/xterm  control
      sequences.  This  isn’t  particularly very nice if your terminal doesn’t
      understand them. Sorry.

   if {expression} command...

      If the evaluated expression is non-zero, the command will be executed.

      Note that {} rather than () surround the expression.

      Expression may be comprised of numbers, operators, parenthesis, etc.  In
      addition to the normal +, -, *, and /, are:

         !x  ¡Ä yields 0 if x is non-zero, 1 if x is zero.
         x && y ¡Ä
         !x    ¡Ä¡Ænot¡ÇYields 1 if x is zero, 0 if non-zero.
         x & y ¡Ä¡Æand¡ÇYields 1 if both x and y are non-zero, 0 otherwise.
         x | y ¡Ä¡Æor¡Ç Yields 1 if x or y (or both) is non-zero, 0 otherwise

      There  may  also  be the special tokens true and false which are 1 and 0
      respectively.

      There are also checked, matched, printed, nonword,  and  filtered  which
      correspond to the values printed by the stats command.

      An  example  use  might  be  the following kind of thing in an computer-
      generated script:

        !d!expect this line
        if {!printed} msg Oops! couldn’t find "expect this line"

   input encoding [ euc | sjis ]
      Used to set (or report) what encoding to use when 8-bit bytes are  found
      in  the  interactive  input  (all flavors of JIS are always recognized).
      Also see the encoding and output encoding commands.

   limit [value]
      Sets the number of lines to print during any search before aborting  (or
      reports the current number if no value given). Default is 100.

      Output limiting is disabled if set to zero.

   log [ to [+] file ]
      Begins  logging the program output to file (the Japanese encoding method
      being the same as for screen  output).   If¡È+¡Éis  given,  the  log  is
      appended  to  any text that might have previously been in file, in which
      case a leading dashed line is inserted into the file.

      If no arguments are given, reports the current logging status.

   log  - | off
      If only¡È-¡Éor off is given, any currently-opened log file is closed.

   load [-now|-whenneeded] "filename"
      Loads the named file to the next available slot.   If  a  precomputed
      index  is found (as¡Èfilename.jin¡É)it is loaded as well.  Otherwise,
      an index is generated internally.

      The file to be loaded (and the  index,  if  loaded)  will  be  loaded
      during  idle  times. This allows a startup file to list many files to
      be loaded, but not have to wait for each of them  to  load  in  turn.
      Using  the  ¡È-now¡Éflag causes the load to happen immediately, while
      using the ¡È-whenneeded¡Éoption (can be shortened  to  ¡È-wn¡É)causes
      the load to happen only when the slot is first accessed.

      Invoke lookup as
         % lookup -writeindex filename
      to generate and write an index file, which will then be automatically
      used in the future.

      If the file has already been loaded, the file is not re-read, but the
      previously-read  file is shared. The new slot will, however, have its
      own separate flags, prompt, filter, etc.

   modify /regex/replace/[ig]
      Sets the modify parameter for the selected file.  If  a  file  has  a
      modify  parameter  associated  with  it,  each line selected during a
      search will have that part of the line which matches regex  (if  any)
      replaced by the replacement string before being printed.

      Like  the  filter  command,  the delimiter need not be¡Æ/¡Ç; any non-
      space character is fine.  If  a  final¡Æi¡Çis  given,  the  regex  is
      applied  in  a  case-insensitive manner. If a final¡Æg¡Çis given, the
      replacement is done to all matches in the line, not  just  the  first
      part that might match regex.

      The  replacement may have embedded¡È1¡É, etc. in it to refer to parts
      of the matched text (see the tutorial on regular expressions).

      The modify parameter, once set, may be enabled or disabled  with  the
      other  form  of the modify command (described below).  It may also be
      temporarily toggled via the¡È!m!¡Éline prefix.

      A silly example for the ultra-nationalist might be:
        modify /<Japan>/Dainippon Teikoku/g
      So that a line such as
        Æü¶ä [¤Ë¤Á¤®¤ó] /Bank of Japan/
      would come out as
        Æü¶ä [¤Ë¤Á¤®¤ó] /Bank of Dainippon Teikoku/

      As a real example of the modify command with kanjidic, consider  that
      it  is  likely  that  one is not interested in all the various fields
      each entry has.  The following can be used to remove the info on  the
      U, N, Q, M, E, B, C, and Y fields from the output:

        modify /( [UNQMECBY]\S+)+//g,1

      It’s sort of complex, but works.  Note that here the replacement part
      is empty, meaning to just remove  those  parts  which  matched.   The
      result of such a search of Æü would normally print

          Æü 467c U65e5 N2097 B72 B73 S4 G1 H3027 F1 Q6010.0 MP5.0714 ¡À
          MN13733 E62 Yri4 P3-3-1 ¥Ë¥Á ¥¸¥Ä ¤Ò -¤Ó -¤« {day}

      but with the above modify spec, appears more simply as

          Æü 467c S4 G1 H3027 F1 P3-3-1 ¥Ë¥Á ¥¸¥Ä ¤Ò -¤Ó -¤« {day}

   modify [boolean]
      Enables  or  disables  the modify parameter for the selected file, or
      report the current status if no argument is given.

   msg string
      The given string is printed.

      Most likely used in a script as the target command of an if  command.

   output encoding [ euc | sjis | jis...]
      Used  to set exactly what kind of encoding should be used for program
      output (also see the input encoding command). Used when the  encoding
      command is not detailed enough for one’s needs.

      If  no  argument  is  given,  reports  the  current  output encoding.
      Otherwise, arguments can usually  be  any  reasonable  dash-separated
      combination of:

        euc
           Selects EUC for the output encoding.

        sjis
           Selects Shift-JIS for the output encoding.

        jis[78|83|90][-ascii|-roman]
           Selects JIS for the output encoding.  If no year (78, 83, or 90)
           given, 78 is used. Can optionally specify  that¡ÈEnglish¡Éshould
           be  encoded  as regular ASCII (the default when JIS selected) or
           as JIS-ROMAN.

        212
           Indicates that JIS X0212-1990 should be supported  (ignored  for
           Shift-JIS output).

        no212
           Indicates  that  JIS  X0212-1990  should  be  not  be  supported
           (default setting).  This places JIS X0212-1990 characters  under
           the domain of disp, nodisp, code, or mark (described below).

        hwk
           Indicates  that  half  width  kana should be left as-is (default
           setting).

        nohwk
           Indicates that half width  kana  should  be  stripped  from  the
           output.  (not yet implemented).

        foldhwk
           Indicates  that  half width kana should be folded to their full-
           width counterparts.  (not yet implemented).

        disp
           Indicates  that  non-displayable   characters   (such   as   JIS
           X0212-1990 while the output encoding method is Shift-JIS) should
           be  passed  along  anyway  (most  likely  resulting  in   screen
           garbage).

        nodisp
           Indicates  that  non-displayable  characters  should  be quietly
           stripped from the output.

        code
           Indicates that non-displayable characters should be  printed  as
           their octal codes (default setting).

        mark
           Indicates  that  non-displayable  characters  should  be printed
           as¡È¡ú¡É.

        Of course, not all options make sense in all  combinations,  or  at
        all  times.  When the current (or new) output encoding is reported,
        a complete and exact specifier  representing  the  output  encoding
        selected.  An example might be¡Èjis78-ascii-no212-hwk-code¡É.

   pager [ boolean | size ]
      Turns  on  or off an output pager, sets it’s idea of the screen size,
      or reports the current status.

      Size can be a single number indicating the  number  of  lines  to  be
      printed  between¡ÈMORE?¡Éprompts  (usually  a few lines less than the
      total screen height, the default being 20 lines). It can also be  two
      numbers  in  the  form¡È#x#¡Éwhere  the first number is the width (in
      half-width characters; default 80) and the second is  the  lines-per-
      page as above.

      If   the   pager   is  on,  every  page  of  output  will  result  in
      a¡ÈMORE?¡Éprompt, at which there are four possible responses. A space
      will  allow one more full page to print. A return will allow one more
      line.  A¡Æc¡Ç(for¡Ècontinue¡É) will all the rest of the  output  (for
      the    current    command)    to   proceed   without   pause,   while
      a¡Æq¡Ç(for¡Èquit¡É) will flush the output for the current command.

      If  supported  by  the  OS,  the  pager  size  parameters   are   set
      appropriately from the window size upon startup or window resize.

      The default pager status is¡Èoff¡É.

   [local] prompt "string"
      Sets  the  prompt  string.   If¡Èlocal¡Éis indicated, sets the prompt
      string for the selected slot only. Otherwise, sets the global default
      prompt string.

      Prompt  strings  may  have  the special %-sequences shown below, with
      related commands given in parenthesis:

         %N ¡Ä the default slot’s file or combo name.
         %n ¡Ä like %N, but any leading path is not shown if a filename.
         %# ¡Ä the default slot’s number.
         %S ¡Ä the¡Ècommand-introduction¡Écharacter (cmdchar)
         %0 ¡Ä the running program’s name
         %F=’string’ ¡Ä string shown if filtering enabled (filter)
         %M=’string’ ¡Ä string shown if modification enabled (modify)
         %w=’string’ ¡Ä string shown if word mode on (word)
         %c=’string’ ¡Ä string shown if case folding on (fold)
         %f=’string’ ¡Ä string shown if fuzzification on (fuzz).
         %W=’string’ ¡Ä string shown if wildcard-pat. mode on (wildcard).
         %d=’string’ ¡Ä string shown if displaying on (display).
         %C=’string’ ¡Ä string shown if currently entering a command.
         %l=’string’ ¡Ä string shown if logging is on (log).
         %L ¡Ä the name of the current output log, if any (log)

      For the tests  (%f,  etc),  you  can  put¡Æ!¡Çjust  after  the¡Æ%¡Çto
      reverse  the  sense of the test (i.e. %!f="no fuzz").  The reverse of
      %F is if a filter is installed but disabled (i.e.  string will  never
      be  shown if there is no filter for the default file).  The modify %M
      works comparably.

      Also, you can use an alternative form for  the  items  that  take  an
      argument  string.  Replacing  the  quotes with parentheses will treat
      string as a recursive prompt specifier. For example, the specifier

           %C=’command’%!C(%f=’fuzzy ’search:)

      would result in a¡Ècommand¡Éprompt if entering a  command,  while  it
      would result in either a¡Èfuzzy search:¡Éor a¡Èsearch:¡Éprompt if not
      entering a command.  The parenthesized constructs may be nested.

      Note that the letters of the test constructs  are  the  same  as  the
      letters for the¡È!!¡Ésequences described in INPUT SYNTAX.

      An example of a nice prompt command might be:

              prompt "%C(%0 command)%!C(%w’*’%!f’raw ’%n)> "

      With  this  prompt  specification,  the  prompt would normally appear
      as¡Èfilename> ¡Ébut  when  fuzzification  is   turned   off   as¡Èraw
      filename> ¡É.  And if word-preference mode is on, the whole thing has
      a¡È*¡Éprepended.  However if a command is being entered,  the  prompt
      would  then become¡Èname command¡É, where name was the program’s name
      (system dependent, but most likely¡Èlookup¡É).

      The default prompt format string is¡È%C(%0 command)%!C(search  [%n])>
      ¡É.

   regex debug [boolean]
      Sets  the internal regex debugging flag (turn on if you want billions
      of lines of stuff spewed to your screen).

   saved list size [value]
      During a search, lines that match might be elided from the output due
      to  filters or word-preference mode.  This command sets the number of
      such lines to remember during any one search, such that they  may  be
      later displayed (before the next search) by the show command.

      The default is 100.

   select [ num | name | . ]
      If  num is given, sets the default slot to that slot number.  If name
      is given, sets the default slot to the first slot found with  a  file
      (or  combination)  loaded  with  that  name.  The incantation¡Èselect
      .¡Émerely sets the default slot to itself, which  can  be  useful  in
      script  files  where  you  want to indicate that any subsequent flags
      changes should work with whatever file was the default  at  the  time
      the script was sourced.

      If  no  argument  is  given,  simply reports the current default slot
      (also see the files command).

      In command files loaded via the source command,  or  as  the  startup
      file,  commands  dealing  with  per-slot  items (flags, local prompt,
      filters, etc.)  work with the file or slot last selected.   The  last
      such selected slot remains selected once the load is complete.

      Interactively,  the  default  slot  will become the selected slot for
      subsequent searches  and  commands  that  aren’t  augmented  with  an
      appended¡È,#¡É(as described in the INPUT SYNTAX section).

   show
      Shows  any  lines  elided  from  the previous search (either due to a
      filter or word-preference mode).

      Will  apply   any   modifications   (see   the¡Èmodify¡Écommand)   if
      modifications  are  enabled  for the file. You can use the¡È!m!¡Éline
      prefix as well with this command (in this case, put  the¡È!m!¡Ébefore
      the command-indicator character).

      The   length   of   the   list   is  controlled  by  the¡Èsaved  list
      size¡Écommand.

   source "filename"
      Commands are read from filename and executed.

      In the file, all lines beginning  with¡È#¡Éare  ignored  as  comments
      (note  that comments must appear on a line by themselves, as¡È#¡Éis a
      reasonable character to have within commands).

      Lines  whose  first  non-blank  characters   is¡È=¡É,¡È!¡É,or¡È+¡Éare
      considered  searches,  while all other non-blank lines are considered
      lookup commands.  Therefore, there is no need for lines to begin with
      the  command-introduction  character.  However, leading whitespace is
      always OK.

      For search lines, take care that any trailing whitespace  is  deleted
      if   undesired,   as   trailing   whitespace  (like  all  non-leading
      whitespace) is kept as part of the regular expression.

      Within a command file, commands that modify per-file flags  and  such
      always  work  with  the  most-recently  loaded  (or  selected)  file.
      Therefore, something along the lines of

        load "my.word.list"
        set word on

        load "my.kanji.list"
        set word off
        set local prompt "enter kanji> "

      would word as might make intuitive sense.

      Since a script file must have a load, or select before  any  per-slot
      flag  is  set,  one  can use¡Èselect .¡Éto facilitate command scripts
      that are to work with¡Èthe current slot¡É.

   spinner [value]
      Set the value of the spinner (A silly little feature).  If set  to  a
      non-zero  value,  will  cause a spinner to spin while a file is being
      checked, one increment per value lines in the file  actually  checked
      against the search specifier.  Default is off (i.e. zero).

   stats
      Shows  information about how many lines of the text file were checked
      against the last search specifier, and how  many  lines  matched  and
      were printed.

   tag [boolean] ["string"]
      Enable, disable, or set the tag for the selected slot.

      If  the  slot is not a combination slot, a tag string may be set (the
      quotes are required).

      If a tag string is  set  and  enabled  for  a  file,  the  string  is
      prepended to each matching output line printed.

      Unlike  the filter and modify commands which automatically enable the
      function when a parameter is set, a tag is not automatically  enabled
      when set.  It can be enabled while being set via¡È’tag¡Éonor could be
      enabled subsequently via just¡Ètag on¡É If the  selected  slot  is  a
      combination  slot,  only the enable/disable status may be changed (on
      by default). No tag string may be set.

      The reason for the special treatment lies in the  special  nature  of
      how tags work in conjunction with combination files.

      During  a  search  when the selected slot is a combination slot, each
      file which is a member of the  combination  has  its  per-file  flags
      disabled  if  their  corresponding  flag  is disabled in the original
      combination slot. This allows the combination slot’s flags to act  as
      a¡Èmask¡Éto blot out each component file’s per-file flags.

      The  tag  flag,  however, is special in that the component file’s tag
      flag is turned on if the combination slot’s tag  flag  is  turned  on
      (and, of course, the component file has a tag string registered).

      The  intended use of this is that one might set a (disabled) tag to a
      file, yet direct searches against that file will  have  no  prepended
      tag.   However, if the file is searched as part of a combination slot
      (and the combination  slot’s  tag  flag  is  on),  the  tag  will  be
      prepended,  allowing  one  to  easily  understand  from which file an
      output line comes.

   verbose [boolean]
      Sets verbose mode on or off, or reports the current  status  (default
      on).   Many  commands  reply  with  a confirmation if verbose mode is
      turned on.

   version
      Reports the current version of the program.

   [default] wildcard [boolean]
      The selected slot’s  patterns  are  considerd  wildcard  patterns  if
      turned  on,  regular expressions if turned off. The current status is
      reported if no argument given.  However,  if¡Èdefault¡Éis  specified,
      the  pattern-type  to  be  inherited  as the default by subsequently-
      loaded files is set (or reported).

      Can be temporarily toggled by the¡È!W!¡Éline prefix.

      When wildcard  patterns  are  selected,  the  changed  metacharacters
      are:¡È*¡Émeans¡Èany            stuff¡É,¡È?¡Émeans¡Èany            one
      character¡É,while¡È+¡Éand¡È.¡Ébecome  unspecial.  Other  regex  items
      such as¡È|¡É,¡È(¡É,¡È[¡É,etc. are unchanged.

      What¡È*¡Éand¡È?¡Éwill actually match depends upon the status of word-
      mode, as well as on the pattern itself.  If word-mode is  on,  or  if
      the pattern begins with the start-of-word¡È<¡Éor¡È[¡É,only non-spaces
      will be matched. Otherwise, any character will be matched.

      In summary,when wildcard mode is on, the input pattern is effected in
      the following ways:

         * is changed to the regular expression .* or
         ? is changed to the regular expression . or    + is changed to the regular expression +
         . is changed to the regular expression .

      Because  filename  patterns  are  often  called¡Èfilename globs¡É,the
      command¡Èglob¡Écan be used in place of¡Èwildcard¡É.

   [default] word|wordpreference [boolean]
      The selected file’s word-preference mode is turned on or off (default
      is  off), or reports the current setting if no argument is specified.
      However, if¡Èdefault¡Éis specified, the value to be inherited as  the
      default by subsequently-loaded files is set (or reported).

      In  word-preference  mode,  entries are searched for as if the search
      regex had a leading¡Æ<¡Çand a trailing¡Æ>¡Ç, resulting in a  list  of
      entries  with a whole-word match of the regex.  However, if there are
      none, but there are non-word entries, the non-word entries are  shown
      (the¡Èsaved list¡Éis used for this -- see that command). This make it
      an¡Èif there are whole words like this, show me,  otherwise  show  me
      whatever you’ve got¡Émode.

      If there are both word and non-word entries, the non-word entries are
      remembered in the saved  list  (rather  than  any  possible  filtered
      entries being remembered there).

      One  caveat:  if  a search matches a line in more than one place, and
      the first is not a whole-word, while one of the others is,  the  line
      will   be   listed  considered  non-whole  word.   For  example,  the
      search¡Öjapan¡×with word-preference mode on will not  list  an  entry
      such as¡È/Japanese/language in Japan/¡É, as the first¡ÈJapan¡Éis part
      of¡ÈJapanese¡Éand not a whole word.  If you really need  just  whole-
      word entries, use the¡Æ<¡Çand¡Æ>¡Çyourself.

      The mode may be temporarily toggled via the¡È!w!¡Éline prefix.

      The  rules  defining  what lines are filtered, remembered, discarded,
      and shown for each permutation of search are rather complex, but  the
      end result is rather intuitive.

   quit | leave | bye  | exit
      Exits the program.

STARTUP FILE

   If  the  file¡È~/.lookup¡Éis  present,  commands are read from it during
   lookup startup.

   The file is read in the same way as the source command reads files  (see
   that entry for more information on file format, etc.)

   However,  if  there  had  been  files loaded via command-line arguments,
   commands within the startup file to load  files  (and  their  associated
   commands such as to set per-file flags) are ignored.

   Similarly,  any  use of the command-line flags -euc, -jis, or -sjis will
   disable in the startup file the commands dealing with setting the  input
   and/or output encodings.

   The special treatment mentioned in the above two paragraphs only applies
   to commands within the startup  file  itself,  and  does  not  apply  to
   commands  in command-files that might be sourced from within the startup
   file.

   The following is a reasonable example of a startup file:
     ## turn verbose mode off during startup file processing
     verbose off

     prompt "%C([%#]%0)%!C(%w’*’%!f’raw ’%n)> "
     spinner 200
     pager on

     ## The filter for edict will hit for entries that
     ## have only one English part, and that English part
     ## having a pl or pn designation.
     load ~/lib/edict
     filter "name" #^[^/]+/[^/]*<p[ln]>[^/]*/$#
     highlight on
     word on

     ## The filter for kanjidic will hit for entries without a
     ## frequency-of-use number.  The modify spec will remove
     ## fields with the named initial code (U,N,Q,M,E, and Y)
     load ~/lib/kanjidic
     filter "uncommon" !/<F\d+>/
     modify /( [UNQMEY])+//g

     ## Use the same filter for my local word file,
     ## but turn off by default.
     load ~/lib/local.words
     filter "name" #^[^/]+/[^/]*<p[ln]>[^/]*/$#
     filter off
     highlight on
     word on
     ## Want a tag for my local words, but only when
     ## accessed via the combo below
     tag off "¡Õ"

     combine "words" 2 0
     select words

     ## turn verbosity back on for interactive use.
     verbose on

COMMAND-LINE ARGUMENTS

   With the use of  a  startup  file,  command-line  arguments  are  rarely
   needed.  In practical use, they are only needed to create an index file,
   as in:

       lookup -write textfile

   Any command line arguments that aren’t flags are taken to be files which
   are    loaded    in    turn    during    startup.     In    this   case,
   any¡Èload¡É,¡Èfilter¡É, etc.  commands in the startup file are  ignored.

   The following flags are supported:

   -help
      Reports a short help message and exits.

   -write  Creates index files for the named files and exits. No
      startup file is read.

   -euc
      Sets  the  input  and  output  encoding  method to EUC (currently the
      default).  Exactly the same as the¡Èencoding euc¡Écommand.

   -jis
      Sets the input and output encoding method to JIS.  Exactly  the  same
      as the¡Èencoding jis¡Écommand.

   -sjis
      Sets  the input and output encoding method to Shift-JIS.  Exactly the
      same as the¡Èencoding sjis¡Écommand.

   -v -version
      Prints the version string and exits.

   -norc
      Indicates that the startup file should not be read.

   -rc file
      The named  file  is  used  as  the  startup  file,  rather  than  the
      default¡È~/.lookup¡É.  It is an error for the file not to exist.

   -percent num
      When  an index is built, letters that appear on more than num percent
      (default 50) of the lines are elided from the index.  The thought  is
      that  if  a  search  will  have  to check most of the lines in a file
      anyway, one may as well save the large amount of space in  the  index
      file  needed  to  represent  that  information,  and  the  time/space
      tradeoff shifts, as the indexing of oft-occurring letters provides  a
      diminishing return.

      Smaller indexes can be made by using a smaller number.

   -noindex
      Indicates  that  any  files loaded via the command line should not be
      loaded with any precomputed index, but recalculated on the fly.

   -verbose
      Has metric tons of stats spewed whenever an index is created.

   -port ###
      For the (undocumented) server configuration only, tells which port to
      listen on.

OPERATING SYSTEM CONSIDERATIONS

   I/O  primitives  and  behaviors  vary  with  the operating system. On my
   operating system, I can¡Èread¡Éa file by mapping it into  memory,  which
   is  a  pretty much instant procedure regardless of the size of the file.
   When I later access that memory, the appropriate sections  of  the  file
   are automatically read into memory by the operating system as needed.

   This results in lookup starting up and presenting a prompt very quickly,
   but causes the first few searches that need to check a lot of  lines  in
   the  file  to  go  more slowly (as lots of the file will need to be read
   in). However, once the bulk of the file is in,  searches  will  go  very
   fast. The win here is that the rather long file-load times are amortized
   over the first few (or few dozen, depending upon the situation) searches
   rather than always faced right at command startup time.

   On  the  other hand, on an operating system without the mapping ability,
   lookup would start up very slowly as all the files and indexes are  read
   into  memory,  but would then search quickly from the beginning, all the
   file already having been read.

   To get around the slow startup, particularly when many files are loaded,
   lookup  uses  lazy  loading  if it can: a file is not actually read into
   memory at the time the load command is given. Rather, it  will  be  read
   when  first  actually  accessed.   Furthermore,  files  are loaded while
   lookup is idle, such as when waiting  for  user  input.  See  the  files
   command for more information.

REGULAR EXPRESSIONS, A BRIEF TUTORIAL

   Regular  expressions  (¡Èregex¡Éfor short) are a¡Ècode¡Éused to indicate
   what kind of text you’re looking for.   They’re  how  one  searches  for
   things  in  the editors¡Èvi¡É,¡Èstevie¡É,¡Èmifes¡Éetc., or with the grep
   commands.  There are differences among the various regex flavors in  use
   --  I’ll  describe  the flavor used by lookup here. Also, in order to be
   clear for the common case, I might tell a  few  lies,  but  nothing  too
   heinous.

   The regex¡Öa¡×means¡Èany line with an¡Æa¡Çin it.¡É Simple enough.

   The  regex¡Öab¡×means¡Èany  line  with  an¡Æa¡Çimmediately  followed  by
   a¡Æb¡Ç¡É.  So the line
       I am feeling flabby
   would¡Èmatch¡Éthe regex¡Öab¡×because, indeed,  there’s  an¡Èab¡Éon  that
   line. But it wouldn’t match the line

       this line has no a followed _immediately_ by a b

   because, well, what the lines says is true.

   In  most  cases,  letters  and  numbers in a regex just mean that you’re
   looking for those letters and numbers in the order given. However, there
   are some special characters used within a regex.

   A  simple  example  would  be a period. Rather than indicate that you’re
   looking  for  a  period,  it  means¡Èany  character¡É.   So  the   silly
   regex¡Ö.¡×would  mean¡Èany  line  that  has  any character on it.¡ÉWell,
   maybe not so silly... you can use it to find non-blank lines.

   But more commonly it’s used as part of  a  larger  regex.  Consider  the
   regex¡Ögray¡×. It wouldn’t match the line

       The sky was grey and cloudy.

   because   of   the   different   spelling  (grey  vs.  gray).   But  the
   regex¡Ögr.y¡×asks for¡Èany line with a¡Æg¡Ç,¡Ær¡Ç, some  character,  and
   then  a¡Æy¡Ç¡É.   So  this  would  get¡Ègrey¡Éand¡Ègray¡É.     A special
   construct somewhat similar  to¡Æ.¡Çwould  be  the  character  class.   A
   character  class  starts with a¡Æ[¡Çand ends with a¡Æ]¡Ç, and will match
   any character given in between. An example might be

       gr[ea]y

   which would match lines with a¡Æg¡Ç,¡Ær¡Ç, an¡Æe¡Çor an¡Æa¡Ç,  and  then
   a¡Æy¡Ç.  Inside a character class you can list as many characters as you
   want to.

   For example the simple regex¡Öx[0123456789]y¡×would match any line  with
   a digit sandwiched between an¡Æx¡Çand a¡Æy¡Ç.

   The  order  of  the characters within the character class doesn’t really
   matter...¡Ö[513467289]¡×would be the same as¡Ö[0123456789]¡×.

   But as a short cut, you  could  put¡Ö[0-9]¡×instead  of¡Ö[0123456789]¡×.
   So  the character class¡Ö[a-z]¡×would match any lower-case letter, while
   the character class¡Ö[a-zA-Z0-9]¡×would match any letter or digit.

   The character¡Æ-¡Çis special within a character class, but only if  it’s
   not  the  first  thing.  Another character that’s special in a character
   class is¡Æ^¡Ç, if it is the first thing. It¡Èinverts¡Éthe class so  that
   it  will  match any character not listed. The class¡Ö[^a-zA-Z0-9]¡×would
   match any line with spaces or punctuation on them.

   There are some special short-hand sequences for  some  common  character
   classes.  The sequence¡Ö\d¡×means¡Èdigit¡É, and is the same as¡Ö[0-9]¡×.
   ¡Ö\w¡×means¡Èword   element¡Éand   is   the   same   as¡Ö[0-9a-zA-Z_]¡×.
   ¡Ö\s¡×means¡Èspace-type  thing¡Éand  is the same as¡Ö[ \t]¡×(¡Ö\t¡×means
   tab).

   You can also use¡Ö\D¡×,¡Ö\W¡×, and¡Ö\S¡×to mean things not a digit, word
   element, or space-type thing.

   Another  special  character  would  be¡Æ?¡Ç.  This  means¡Èmaybe  one of
   whatever was just before it, not is fine too¡É.  In the  regex  ¡Öbikes?
   for  rent¡×, the¡Èwhatever¡Éwould be the¡Æs¡Ç, so this would match lines
   with either¡Èbikes for rent¡Éor¡Èbike for rent¡É.

   Parentheses are also special, and can group  things  together.   In  the
   regex

   big (fat harry)? deal

   the¡Èwhatever¡Éfor the¡Æ?¡Çwould be¡Èfat harry¡É.  But be careful to pay
   attention to details... this regex would match
       I don’t see what the big fat harry deal is!
   but not
       I don’t see what the big deal is!

   That’s because if you take away the¡Èwhatever¡Éof the¡Æ?¡Ç, you  end  up
   with
       big  deal
   Notice that there are two spaces between the words, and the regex didn’t
   allow for that.  The regex to get either line above would be
       big (fat harry )?deal
   or
       big( fat harry)? deal
   Do you see how they’re essentially the same?

   Similar to¡Æ?¡Çis¡Æ*¡Ç, which  means¡Èany  number,  including  none,  of
   whatever’s  right  in  front¡É.   It more or less means that whatever is
   tagged with¡Æ*¡Çis allowed, but not required, so something like
       I (really )*hate peas
   would match¡ÈI hate peas¡É,¡ÈI really  hate  peas!¡É,¡ÈI  really  really
   hate peas¡É, etc.

   Similar  to  both¡Æ?¡Çand¡Æ*¡Çis¡Æ+¡Ç,  which  means¡Èat  least  one  of
   whatever   just   in   front,   but   more   is   fine    too¡É.     The
   regex¡Ömis+pelling¡×would
   match¡Èmispelling¡É,¡Èmisspelling¡É,¡Èmissspelling¡É,   etc.   Actually,
   it’s  just  the  same  as¡Ömiss*pelling¡×but  more  simple  to type. The
   regex¡Öss*¡×means¡Èan¡Æs¡Ç,   followed   by   zero    or    more¡Æs¡Ç¡É,
   while¡Ös+¡×means¡Èone or more¡Æs¡Ç¡É.  Both really the same.

   The special character¡Æ|¡Çmeans¡Èor¡É.  Unlike¡Æ+¡Ç,¡Æ*¡Ç, and¡Æ?¡Çwhich
   act on the thing immediately before, the¡Æ|¡Çis more¡Èglobal¡É.
       give me (this|that) one
   Would match lines that had¡Ègive me this one¡Éor¡Ègive me  that  one¡Éin
   them.

   You can even combine more than two:
       give me (this|that|the other) one

   How about:
       [Ii]t is a (nice |sunny |bright |clear )*day

   Here, the¡Èwhatever¡Éimmediately before the¡Æ*¡Çis
       (nice |sunny |bright |clear )
   So this regex would match all the following lines:
      It is a day.
      I think it is a nice day.
      It is a clear sunny day today.
      If it is a clear sunny nice sunny sunny sunny bright day then....
   Notice how the¡Ö[Ii]t¡×matches either¡ÈIt¡Éor¡Èit¡É?

   Note that the above regex would also match
      fruit is a day
   because  it  indeed  fulfills all requirements of the regex, even though
   the¡Èit¡Éis really part of the word¡Èfruit¡É.  To answer  concerns  like
   this, which are common, are¡Æ<¡Çand¡Æ>¡Ç, which mean¡Èword break¡É.  The
   regex¡Ö<it¡×would   match   any   line   with¡Èit¡Ébeginning   a   word,
   while¡Öit>¡×would  match  any  line  with¡Èit¡Éending  a  word.  And, of
   course,¡Ö<it>¡×would match any line with the word¡Èit¡Éin it.

   Going back to the regex to find grey/gray, that would make  more  sense,
   then, as
       <gr[ae]y>
   which  would match only the words¡Ègrey¡Éand¡Ègray¡É.   Somewhat similar
   are¡Æ^¡Çand¡Æ$¡Ç, which mean¡Èbeginning  of  line¡Éand¡Èend  of  line¡É,
   respectively  (but,  not  in  a  character  class,  of  course).  So the
   regex¡Ö^fun¡×would find any line that begins  with  the  letters¡Èfun¡É,
   while¡Ö^fun>¡×would  find  any  line  that  begins with the word¡Èfun¡É.
   ¡Ö^fun$¡×would find any line that was exactly¡Èfun¡É.

   Finally,¡Ö^\s*fun\s*$¡×would  match  any  line  that¡Èfun¡Éexactly,  but
   perhaps also had leading and/or trailing whitespace.

   That’s pretty much it. There are more complex things, some of which I’ll
   mention in the list below, but even with these few simple constructs one
   can specify very detailed and complex patterns.

   Let’s summarize some of the special things in regular expressions:

   Items that are basic units:
     char      any non-special character matches itself.
     \char     special chars, when proceeded by \, become non-special.
     .         Matches any one character (except \n).
     \n        Newline
     \t        Tab.
     \r        Carriage Return.
     \f        Formfeed.
     \d        Digit. Just a short-hand for [0-9].
     \w        Word element. Just a short-hand for [0-9a-zA-Z_].
     \s        Whitespace. Just a short-hand for [\t \n\r\f].
     \## \###  Two or three digit octal number indicating a single byte.
     [chars]   Matches a character if it’s one of the characters listed.
     [^chars]  Matches a character if it’s not one of the ones listed.

     The \char items above can be used within a character class,
     but not the items below.

     \D        Anything not \d.
     \W        Anything not \w.
     \S        Anything not \s.
     \a        Any ASCII character.
     \A        Any multibyte character.
     \k        Any (not half-width) katakana character (including ¡¼).
     \K        Any character not \k (except \n).
     \h        Any hiragana character.
     \H        Any character not \h (except \n).
     (regex)   Parens make the regex one unit.
     (?:regex)   [from perl5] Grouping-only parens -- can’t use for \# (below)
     \c        Any JISX0208 kanji (kuten rows 16-84)
     \C        Any character not \c (except \n).
     \#        Match whatever was matched by the #th paren from the left.

   With¡È¡ù¡Éto indicate one¡Èunit¡Éas above, the following may be used:

     ¡ù?       A ¡ù allowed, but not required.
     ¡ù+       At least one ¡ù required, but more ok.
     ¡ù*       Any number of ¡ù ok, but none required.

   There are also ways to match¡Èsituations¡É:

     \b        A word boundary.
     <         Same as \b.
     >         Same as \b.
     ^         Matches the beginning of the line.
     $         Matches the end of the line.

   Finally, the¡Èor¡Éis

     reg1|reg2 Match if either reg1 or reg2 match.

   Note that¡È\k¡Éand the like aren’t allowed in character classes, so
   something such as¡Ö[\k\h]¡×to try to get all kana won’t work.
   Use ¡Ö(\k|\h)¡×instead.

BUGS

   Needs full support for half-width katakana and JIS X 0212-1990.
   Non-EUC (JIS & SJIS) items not tested well.
   Probably won’t work on non-UNIX systems.
   Screen  control  codes (for clear and highlight commands) are hard-coded
   for ANSI/VT100/kterm.

AUTHOR

   Jeffrey Friedl (jfriedl@nff.ncl.omron.co.jp)

INFO

   Jim Breen’s text files edict and kanjidic and their documentation can be
   found in¡Èpub/nihongo¡Éon ftp.cc.monash.edu.au (130.194.1.106

   Information  on  input and output encoding and codes can be found in Ken
   Lunde’s Understanding Japanese Information  Processing  (ÆüËܸì¾ðÊó½èÍý)
   published  by  O’Reilly  and  Associates.  ISBN 1-56592-043-0.  There is
   also a Japanese edition published by SoftBank.

   A program to convert files among the various encoding methods is Dr. Ken
   Lunde’sjconv, which can also be found on ftp.cc.monash.edu.au.  Jconv is
   also useful for converting halfwidth katakana (which lookup doesn’t  yet
   support well) to full-width.