Man Linux: Main Page and Category List

NAME

       xmlwf — Determines if an XML document is well-formed

SYNOPSIS

       xmlwf  [-s]   [-n]   [-p]   [-x]   [-e encoding]  [-w]  [-d output-dir]
       [-c]  [-m]  [-r]  [-t]  [-v]  [file ...]

DESCRIPTION

       xmlwf uses the Expat library to determine if an XML document  is  well-
       formed.  It is non-validating.

       If  you  do  not  specify any files on the command-line, and you have a
       recent version of xmlwf, the input file  will  be  read  from  standard
       input.

WELL-FORMED DOCUMENTS

       A well-formed document must adhere to the following rules:

          ·  The  file  begins  with  an XML declaration.  For instance, <?xml
             version="1.0" standalone="yes"?>.  NOTE:          xmlwf does  not
             currently check for a valid XML declaration.

          ·  Every  start  tag is either empty (<tag/>) or has a corresponding
             end tag.

          ·  There is exactly one root element.  This element must contain all
             other  elements in the document.  Only comments, white space, and
             processing instructions may come after  the  close  of  the  root
             element.

          ·  All elements nest properly.

          ·  All  attribute  values  are  enclosed in quotes (either single or
             double).

       If the document has a DTD, and it strictly complies with that DTD, then
       the  document  is  also  considered  valid.   xmlwf is a non-validating
       parser -- it does not check the DTD.  However, it does support external
       entities (see the -x option).

OPTIONS

       When  an  option  includes  an  argument,  you may specify the argument
       either  separately  ("-d  output")  or  concatenated  with  the  option
       ("-doutput").  xmlwf supports both.

       -c        If   the  input  file  is  well-formed  and  xmlwf    doesn’t
                 encounter any errors, the input file is simply copied to  the
                 output  directory  unchanged.   This  implies  no  namespaces
                 (turns off -n) and requires -d to specify an output file.

       -d output-dir
                 Specifies a directory to contain transformed  representations
                 of  the  input  files.   By  default,  -d outputs a canonical
                 representation (described below).  You can  select  different
                 output formats using -c   and -m.

                 The  output  filenames  will be exactly the same as the input
                 filenames or "STDIN" if the input  is  coming  from  standard
                 input.   Therefore,  you must be careful that the output file
                 does not go into  the  same  directory  as  the  input  file.
                 Otherwise,  xmlwf  will  delete  the  input  file  before  it
                 generates the output file (just like running  cat  <  file  >
                 file in most shells).

                   Two  structurally equivalent XML documents have a byte-for-
                 byte  identical  canonical  XML  representation.   Note  that
                 ignorable  white  space  is  considered  significant  and  is
                 treated equivalently to data.  More on canonical XML  can  be
                 found at http://www.jclark.com/xml/canonxml.html .

       -e encoding
                 Specifies the character encoding for the document, overriding
                 any document encoding declaration.   xmlwf     supports  four
                 built-in  encodings: US-ASCII, UTF-8, UTF-16, and ISO-8859-1.
                 Also see the -w option.

       -m        Outputs  some  strange  sort  of  XML  file  that  completely
                 describes the the input file, including character postitions.
                 Requires -d to specify an output file.

       -n        Turns on  namespace  processing.   (describe  namespaces)  -c
                 disables namespaces.

       -p        Tells  xmlwf to process external DTDs and parameter entities.

                 Normally xmlwf never parses parameter entities.  -p tells  it
                 to always parse them.  -p implies -x.

       -r        Normally  xmlwf memory-maps the XML file before parsing; this
                 can result in faster parsing on many platforms.  -r turns off
                 memory-mapping  and  uses  normal  file IO calls instead.  Of
                 course,  memory-mapping  is  automatically  turned  off  when
                 reading from standard input.

                 Use  of  memory-mapping  can  cause  some platforms to report
                 substantially higher memory usage for xmlwf, but this appears
                 to  be a matter of the operating system reporting memory in a
                 strange way; there is not a leak in xmlwf.

       -s        Prints an  error  if  the  document  is  not  standalone.   A
                 document  is  standalone  if it has no external subset and no
                 references to parameter entities.

       -t        Turns on timings.  This tells Expat to parse the entire file,
                 but not perform any processing.  This gives a fairly accurate
                 idea  of  the  raw  speed  of  Expat  itself  without  client
                 overhead.   -t  turns off most of the output options (-d, -m,
                 -c, ...).

       -v        Prints the version of the Expat library being used, including
                 some  information  on  the  compile-time configuration of the
                 library, and then exits.

       -w        Enables support for Windows code pages.  Normally, xmlwf will
                 throw  an  error if it runs across an encoding that it is not
                 equipped to handle itself.  With -w, xmlwf will try to use  a
                 Windows code page.  See also -e.

       -x        Turns on parsing external entities.

                 Non-validating  parsers  are not required to resolve external
                 entities, or even  expand  entities  at  all.   Expat  always
                 expands  internal  entities  (?), but external entity parsing
                 must be enabled explicitly.

                 External entities are simply entities that obtain their  data
                 from outside the XML file currently being parsed.

                 This is an example of an internal entity:

       <!ENTITY vers ’1.0.2’>

                 And here are some examples of external entities:

       <!ENTITY header SYSTEM "header-&vers;.xml">  (parsed)
       <!ENTITY logo SYSTEM "logo.png" PNG>         (unparsed)

       --        (Two hyphens.)  Terminates the list of options.  This is only
                 needed if a filename starts with a hyphen.  For example:

       xmlwf -- -myfile.xml

                 will run xmlwf on the file -myfile.xml.

       Older versions of xmlwf do not support reading from standard input.

OUTPUT

       If an input file  is  not  well-formed,  xmlwf  prints  a  single  line
       describing  the  problem to standard output.  If a file is well formed,
       xmlwf outputs nothing.  Note that the result code is not set.

BUGS

       xmlwf returns a 0 - noerr result, even if the file is not  well-formed.
       There is no good way for a program to use xmlwf to quickly check a file
       -- it must parse xmlwf’s standard output.

       The errors should go to standard error, not standard output.

       There should be a way to get -d to send its output to  standard  output
       rather than forcing the user to send it to a file.

       I have no idea why anyone would want to use the -d, -c, and -m options.
       If someone could explain it to me, I’d like to add this information  to
       this manpage.

ALTERNATIVES

       Here are some XML validators on the web:

       http://www.hcrc.ed.ac.uk/~richard/xml-check.html
       http://www.stg.brown.edu/service/xmlvalid/
       http://www.scripting.com/frontier5/xml/code/xmlValidator.html
       http://www.xml.com/pub/a/tools/ruwf/check.html

SEE ALSO

       The Expat home page:        http://www.libexpat.org/
       The W3 XML specification:   http://www.w3.org/TR/REC-xml

AUTHOR

       This  manual  page was written by Scott Bronson bronson@rinspin.com for
       the Debian GNU/Linux system (but may be used by others).  Permission is
       granted to copy, distribute and/or modify this document under the terms
       of the GNU Free Documentation License, Version 1.1.