Man Linux: Main Page and Category List


       bt_misc - miscellaneous BibTeX-like string-processing utilities


          void bt_purify_string (char * string, ushort options);
          void bt_change_case (char transform, char * string, ushort options);


              void bt_purify_string (char * string, ushort options);

           "Purifies" a "string" in the BibTeX way (usually used for
           generating sort keys).  "string" is modified in-place.  "options"
           is currently unused; just set it to zero for future compatibility.
           Purification consists of copying alphanumeric characters,
           converting hyphens and ties to space, copying spaces, and skipping
           (almost) everything else.

           "Almost" because "special characters" (used for accented and non-
           English letters) are handled specially.  Recall that a BibTeX
           special character is any brace-group that starts at brace-depth
           zero whose first character is a backslash.  For instance, the

              {\foo bar}Herr M\"uller went from {P{\r r}erov} to {\AA}rhus

           contains two special characters: "{\foo bar}" and "\AA".  Neither
           the "\"u" nor the "\r r" are special characters, because they are
           not at the right brace depth.

           Special characters are handled as follows: if the control sequence
           (the TeX command that follows the backslash) is recognized as one
           of LaTeX’s "foreign letters" ("\oe", "\ae", "\o", "\l", "\ae",
           "\ss", plus uppercase versions), then it is converted to a
           reasonable English approximation by stripping the backslash and
           converting the second character (if any) to lowercase; thus,
           "{\AA}" in the above example would become simply "Aa".  All other
           control sequences in a special character are stripped, as are all
           non-alphabetic characters.

           For example the above string, after "purification," becomes

              barHerr Muller went from Pr rerov to Aarhus

           Obviously, something has gone wrong with the word "P{\r r}erov" (a
           town in the Czech Republic).  The accented ‘r’ should be a special
           character, starting at brace-depth zero.  If the original string
           were instead

              {\foo bar}Herr M\"uller went from P{\r r}erov to {\AA}rhus

           then the purified result would be more sensible:

              barHerr Muller went from Prerov to Aarhus

           Note the use of a "nonsense" special character "{\foo bar}": this
           trick is often used to put certain text in a string solely for
           generating sort keys; the text is then ignored when the document is
           processed by TeX (as long as "\foo" is defined as a no-op TeX
           macro).  This assumes, of course, that the output is eventually
           processed by TeX; if not, then this trick will backfire on you.

           Also, "bt_purify_string()" is adequate for generating sort keys
           when you want to sort according to English-language conventions.
           To follow the conventions of other languages, though, a more
           sophisticated approach will be needed; hopefully, future versions
           of btparse will address this deficiency.

              void bt_change_case (char transform, char * string, ushort options);

           Converts a string to lowercase, uppercase, or "non-book title
           capitalization", with special attention paid to BibTeX special
           characters and other brace-groups.  The form of conversion is
           selected by the single character "transform": ’u’ to convert to
           uppercase, ’l’ for lowercase, and ’t’ for "title capitalization".
           "string" is modified in-place, and "options" is currently unused;
           set it to zero for future compatibility.

           Lowercase and uppercase conversion are obvious, with the proviso
           that text in braces is treated differently (explained below).
           Title capitalization simply means that everything is converted to
           lowercase, except the first letter of the first word, and words
           immediately following a colon or sentence-ending punctuation.  For

              Flying Squirrels: Their Peculiar Habits. Part One

           would be converted to

              Flying squirrels: Their peculiar habits. Part one

           Text within braces is handled as follows.  First, in a "special
           character" (see above for definition), control sequences that
           constitute one of LaTeX’s non-English letters are converted
           appropriately---e.g., when converting to lowercase, "\AE" becomes
           "\ae").  Any other control sequence in a special character
           (including accents) is preserved, and all text in a special
           character, regardless of depth and punctuation, is converted to
           lowercase or uppercase.  (For "title capitalization," all text in a
           special character is converted to lowercase.)

           Brace groups that are not special characters are left completely
           untouched: neither text nor control sequences within non-special
           character braces are touched.

           For example, the string

              A Guide to \LaTeXe: Document Preparation ...

           would, when "transform" is ’t’ (title capitalization), be converted

              A guide to \latexe: Document preparation ...

           which is probably not the desired result.  A better attempt is

              A Guide to {\LaTeXe}: Document Preparation ...

           which becomes

              A guide to {\LaTeXe}: Document preparation ...

           However, if you go back and re-read the description of
           "bt_purify_string()", you’ll discover that "{\LaTeXe}" here is a
           special character, but not a non-English letter: thus, the control
           sequence is stripped.  Thus, a sort key generated from this title
           would be

              A Guide to  Document Preparation

           ...oops!  The right solution (and this applies to any title with a
           TeX command that becomes actual text) is to bury the control
           sequence at brace-depth two:

              A Guide to {{\LaTeXe}}: Document Preparation ...




       Greg Ward <>