thttpd - tiny/turbo/throttling HTTP server

NAME

       thttpd - tiny/turbo/throttling HTTP server

SYNOPSIS

       thttpd  [-C  configfile]  [-p  port]  [-d dir] [-dd data_dir] [-r|-nor]
       [-s|-nos] [-v|-nov] [-g|-nog] [-u user] [-c cgipat] [-t throttles]  [-h
       host]  [-l logfile] [-i pidfile] [-T charset] [-P P3P] [-M maxage] [-V]
       [-D]

DESCRIPTION

       thttpd is a simple, small, fast, and secure HTTP  server.   It  doesn’t
       have  a  lot  of special features, but it suffices for most uses of the
       web, it’s about as fast as  the  best  full-featured  servers  (Apache,
       NCSA,  Netscape), and it has one extremely useful feature (URL-traffic-
       based throttling) that no other server currently has.

OPTIONS

       -C     Specifies a config-file to read.  All options can be set  either
              by  command-line  flags  or  in  the config file.  See below for
              details.

       -p     Specifies an alternate port number to listen on.  The default is
              80.   The  config-file  option name for this flag is "port", and
              the config.h option is DEFAULT_PORT.

       -d     Specifies a directory to chdir() to at startup.  This is  merely
              a  convenience  -  you could just as easily do a cd in the shell
              script that invokes the program.  The  config-file  option  name
              for  this  flag  is  "dir", and the config.h options are WEBDIR,
              USE_USER_DIR.

       -r     Do a chroot() at initialization time, restricting file access to
              the  program’s  current  directory.   If  -r  is the compiled-in
              default, then -nor disables it.  See  below  for  details.   The
              config-file   option  names  for  this  flag  are  "chroot"  and
              "nochroot", and the config.h option is ALWAYS_CHROOT.

       -dd    Specifies a directory to chdir() to after chrooting.  If  you’re
              not chrooting, you might as well do a single chdir() with the -d
              flag.  If you are chrooting, this lets you put the web files  in
              a  subdirectory  of the chroot tree, instead of in the top level
              mixed in with the chroot files.  The config-file option name for
              this flag is "data_dir".

       -nos   Don’t  do  explicit  symbolic  link  checking.  Normally, thttpd
              explicitly expands any symbolic links  in  filenames,  to  check
              that the resulting path stays within the original document tree.
              If you want to turn off this check and save some CPU  time,  you
              can  use  the -nos flag, however this is not recommended.  Note,
              though, that if you are using the  chroot  option,  the  symlink
              checking  is  unnecessary  and is turned off, so the safe way to
              save those CPU cycles is to use chroot.  The config-file  option
              names for this flag are "symlinks" and "nosymlinks".

       -v     Do el-cheapo virtual hosting.  If -v is the compiled-in default,
              then -nov disables it.  See below for details.  The  config-file
              option  names  for  this flag are "vhost" and "novhost", and the
              config.h option is ALWAYS_VHOST.

       -g     Use a global passwd file.  This means that  every  file  in  the
              entire  document  tree is protected by the single .htpasswd file
              at the  top  of  the  tree.   Otherwise  the  semantics  of  the
              .htpasswd file are the same.  If this option is set but there is
              no .htpasswd  file  in  the  top-level  directory,  then  thttpd
              proceeds  as  if  the  option  was not set - first looking for a
              local .htpasswd file, and if  that  doesn’t  exist  either  then
              serving the file without any password.  If -g is the compiled-in
              default, then -nog disables it.  The  config-file  option  names
              for  this  flag are "globalpasswd" and "noglobalpasswd", and the
              config.h option is ALWAYS_GLOBAL_PASSWD.

       -u     Specifies what user  to  switch  to  after  initialization  when
              started  as  root.   The  default  is "nobody".  The config-file
              option name for this flag is "user", and the config.h option  is
              DEFAULT_USER.

       -c     Specifies  a  wildcard  pattern  for  CGI programs, for instance
              "**.cgi" or "/cgi-bin/*".  See below for details.   The  config-
              file  option  name  for  this flag is "cgipat", and the config.h
              option is CGI_PATTERN.

       -t     Specifies a file of throttle settings.  See below  for  details.
              The config-file option name for this flag is "throttles".

       -h     Specifies  a  hostname to bind to, for multihoming.  The default
              is to bind to all hostnames supported on the local machine.  See
              below for details.  The config-file option name for this flag is
              "host", and the config.h option is SERVER_NAME.

       -l     Specifies a file for logging.  If no -l argument  is  specified,
              thttpd  logs  via  syslog().   If  "-l  /dev/null" is specified,
              thttpd doesn’t log at all.  The config-file option name for this
              flag is "logfile".

       -i     Specifies  a  file  to  write  the process-id to.  If no file is
              specified, no process-id is written.  You can use this  file  to
              send signals to thttpd.  See below for details.  The config-file
              option name for this flag is "pidfile".

       -T     Specifies the character set to use with text  MIME  types.   The
              default  is  iso-8859-1.   The  config-file option name for this
              flag is "charset", and the config.h option is DEFAULT_CHARSET.

       -P     Specifies a P3P server privacy header to be  returned  with  all
              responses.   See  http://www.w3.org/P3P/  for  details.   Thttpd
              doesn’t do anything at all with the string except put it in  the
              P3P: response header.  The config-file option name for this flag
              is "p3p".

       -M     Specifies the number of seconds to be used in a  "Cache-Control:
              max-age"   header   to  be  returned  with  all  responses.   An
              equivalent "Expires" header is also generated.  The  default  is
              no Cache-Control or Expires headers, which is just fine for most
              sites.  The config-file option name for this flag is  "max_age".

       -V     Shows the current version info.

       -D     This  was  originally  just a debugging flag, however it’s worth
              mentioning because one of the things it does is  prevent  thttpd
              from  making itself a background daemon.  Instead it runs in the
              foreground like a regular program.  This is necessary  when  you
              want  to  run  thttpd  wrapped  in  a  little  shell script that
              restarts it if it exits.

CONFIG-FILE

       All the command-line options can also be set in  a  config  file.   One
       advantage  of  using a config file is that the file can be changed, and
       thttpd will pick up the changes with a restart.

       The syntax of the config file  is  simple,  a  series  of  "option"  or
       "option=value"  separated  by  whitespace.  The option names are listed
       above with their corresponding command-line flags.

CHROOT

       chroot() is a system call that restricts  the  program’s  view  of  the
       filesystem  to  the  current  directory  and  directories below it.  It
       becomes impossible for remote users to access any file outside  of  the
       initial directory.  The restriction is inherited by child processes, so
       CGI programs get it too.  This is a very strong security  measure,  and
       is recommended.  The only downside is that only root can call chroot(),
       so this means the program must be started as root.  However,  the  last
       thing  it  does  during  initialization  is  to  give up root access by
       becoming another user, so this is safe.

       The program  can  also  be  compile-time  configured  to  always  do  a
       chroot(), without needing the -r flag.

       Note that with some other web servers, such as NCSA httpd, setting up a
       directory tree for use with chroot() is complicated, involving creating
       a  bunch  of  special  directories  and copying in various files.  With
       thttpd it’s a lot easier, all you have to do is make sure  any  shells,
       utilities,  and  config files used by your CGI programs and scripts are
       available.  If you have CGI disabled, or if you make a policy that  all
       CGI  programs  must  be  written  in  a compiled language such as C and
       statically linked, then you probably don’t have to do any setup at all.

       However, one thing you should do is tell syslogd about the chroot tree,
       so that thttpd can still generate syslog messages.  Check your system’s
       syslodg  man  page  for  how  to  do  this.   In  FreeBSD you would put
       something like this in /etc/rc.conf:
           syslogd_flags="-l /usr/local/www/data/dev/log"
       Substitute in your own chroot tree’s pathname, of course.  Don’t  worry
       about  creating  the log socket, syslogd wants to do that itself.  (You
       may need to create the dev directory.)  In Linux the flag is -a instead
       of -l, and there may be other differences.

       Relevant config.h option: ALWAYS_CHROOT.

CGI

       thttpd supports the CGI 1.1 spec.

       In  order  for a CGI program to be run, its name must match the pattern
       specified either at compile time or on the command  line  with  the  -c
       flag.  This is a simple shell-style filename pattern.  You can use * to
       match any string not including a slash,  or  **  to  match  any  string
       including  slashes,  or  ? to match any single character.  You can also
       use multiple such patterns separated by |.  The  patterns  get  checked
       against  the  filename part of the incoming URL.  Don’t forget to quote
       any wildcard characters so that the shell doesn’t mess with them.

       Restricting  CGI  programs  to  a  single  directory  lets   the   site
       administrator   review   them  for  security  holes,  and  is  strongly
       recommended.  If there are individual users that  you  trust,  you  can
       enable their directories too.

       If  no CGI pattern is specified, neither here nor at compile time, then
       CGI programs cannot be run at all.  If you want to  disable  CGI  as  a
       security  measure,  that’s how you do it, just comment out the patterns
       in the config file and don’t run with the -c flag.

       Note: the current working directory when a CGI program gets run is  the
       directory  that  the  CGI  program lives in.  This isn’t in the CGI 1.1
       spec, but it’s what most other HTTP servers do.

       Relevant  config.h  options:  CGI_PATTERN,   CGI_TIMELIMIT,   CGI_NICE,
       CGI_PATH, CGI_LD_LIBRARY_PATH, CGIBINDIR.

BASIC AUTHENTICATION

       Basic  Authentication  is  available  as an option at compile time.  If
       enabled, it uses a password file in  the  directory  to  be  protected,
       called  .htpasswd  by  default.  This file is formatted as the familiar
       colon-separated username/encrypted-password pair, records delimited  by
       newlines.   The  protection does not carry over to subdirectories.  The
       utility program thtpasswd(1) is included  to  help  create  and  modify
       .htpasswd files.

       Relevant config.h option: AUTH_FILE

THROTTLING

       The  throttle  file  lets  you  set  maximum  byte rates on URLs or URL
       groups.  You can optionally set a minimum rate too.  The format of  the
       throttle  file  is  very simple.  A # starts a comment, and the rest of
       the line is ignored.  Blank lines are ignored.  The rest of  the  lines
       should  consist of a pattern, whitespace, and a number.  The pattern is
       a simple shell-style filename pattern, using ?/**/*, or  multiple  such
       patterns separated by |.

       The numbers in the file are byte rates, specified in units of bytes per
       second.  For comparison, a v.90 modem gives about 5000 B/s depending on
       compression,  a  double-B-channel  ISDN  line about 12800 B/s, and a T1
       line is about 150000 B/s.  If you want to set a minimum rate  as  well,
       use number-number.

       Example:
         # throttle file for www.acme.com

         **              2000-100000  # limit total web usage to 2/3 of our T1,
                                      # but never go below 2000 B/s
         **.jpg|**.gif   50000   # limit images to 1/3 of our T1
         **.mpg          20000   # and movies to even less
         jef/**          20000   # jef’s pages are too popular

       Throttling  is  implemented  by  checking  each  incoming  URL filename
       against  all  of  the  patterns  in  the  throttle  file.   The  server
       accumulates statistics on how much bandwidth each pattern has accounted
       for recently (via a rolling average).  If a URL matches a pattern  that
       has  been  exceeding  its  specified  limit,  then the data returned is
       actually slowed down, with pauses between each block.   If  that’s  not
       possible  (e.g.  for  CGI  programs) or if the bandwidth has gotten way
       larger than the limit, then the server returns a  special  code  saying
       ’try again later’.

       The  minimum  rates  are implemented similarly.  If too many people are
       trying to fetch something at the same time, throttling  may  slow  down
       each connection so much that it’s not really useable.  Furthermore, all
       those slow connections clog up the server, using up  file  handles  and
       connection  slots.   Setting  a  minimum  rate says that past a certain
       point you should not even bother - the server returns  the  ’try  again
       later" code and the connection isn’t even started.

       There   is  no  provision  for  setting  a  maximum  connections/second
       throttle, because throttling a request uses as much cpu as handling it,
       so  there would be no point.  There is also no provision for throttling
       the number of simultaneous connections on a per-URL basis.  However you
       can control the overall number of connections for the whole server very
       simply, by setting the operating system’s per-process  file  descriptor
       limit  before  starting thttpd.  Be sure to set the hard limit, not the
       soft limit.

MULTIHOMING

       Multihoming means using one machine to serve multiple  hostnames.   For
       instance,  if  you’re  an  internet provider and you want to let all of
       your  customers  have  customized  web  addresses,   you   might   have
       www.joe.acme.com,  www.jane.acme.com,  and  your  own www.acme.com, all
       running on the same physical hardware.  This feature is also  known  as
       "virtual hosts".  There are three steps to setting this up.

       One,  make DNS entries for all of the hostnames.  The current way to do
       this, allowed by HTTP/1.1, is to use CNAME aliases, like so:
         www.acme.com IN A 192.100.66.1
         www.joe.acme.com IN CNAME www.acme.com
         www.jane.acme.com IN CNAME www.acme.com
       However, this is incompatible with older  HTTP/1.0  browsers.   If  you
       want  to  stay  compatible,  there’s  a  different  way - use A records
       instead, each with a different IP address, like so:
         www.acme.com IN A 192.100.66.1
         www.joe.acme.com IN A 192.100.66.200
         www.jane.acme.com IN A 192.100.66.201
       This is bad because it uses  extra  IP  addresses,  a  somewhat  scarce
       resource.   But  if  you  want people with older browsers to be able to
       visit your sites, you still have to do it this way.

       Step two.  If you’re using the modern CNAME method of multihoming, then
       you can skip this step.  Otherwise, using the older multiple-IP-address
       method you must set up IP aliases or multiple interfaces for the  extra
       addresses.  You can use ifconfig(8)’s alias command to tell the machine
       to answer to all of the different IP addresses.  Example:
         ifconfig le0 www.acme.com
         ifconfig le0 www.joe.acme.com alias
         ifconfig le0 www.jane.acme.com alias
       If your OS’s version of ifconfig doesn’t have an alias command,  you’re
       probably          out          of         luck         (but         see
       http://www.acme.com/software/thttpd/notes.html).

       Third and last, you must set up thttpd to handle  the  multiple  hosts.
       The  easiest  way  is  with  the  -v flag, or the ALWAYS_VHOST config.h
       option.  This works  with  either  CNAME  multihosting  or  multiple-IP
       multihosting.   What  it  does  is  send  each  incoming  request  to a
       subdirectory based on the hostname it’s intended for.  All you have  to
       do  in  order to set things up is to create those subdirectories in the
       directory where thttpd will run.  With the example above, you’d do like
       so:
         mkdir www.acme.com www.joe.acme.com www.jane.acme.com
       If  you’re  using  old-style  multiple-IP multihosting, you should also
       create symbolic links from the numeric addresses to the names, like so:
         ln -s www.acme.com 192.100.66.1
         ln -s www.joe.acme.com 192.100.66.200
         ln -s www.jane.acme.com 192.100.66.201
       This lets the older HTTP/1.0 browsers find the right subdirectory.

       There’s  an  optional  alternate step three if you’re using multiple-IP
       multihosting: run a separate thttpd process for  each  hostname,  using
       the  -h  flag  to  specify  which  one  is  which.  This gives you more
       flexibility, since you can run each  of  these  processes  in  separate
       directories, with different throttle files, etc.  Example:
         thttpd -r -d /usr/www -h www.acme.com
         thttpd -r -d /usr/www/joe -u joe -h www.joe.acme.com
         thttpd -r -d /usr/www/jane -u jane -h www.jane.acme.com
       But  remember,  this  multiple-process  method does not work with CNAME
       multihosting - for that, you must use a single thttpd process with  the
       -v flag.

CUSTOM ERRORS

       thttpd lets you define your own custom error pages for the various HTTP
       errors.  There’s a separate file for each error number, all  stored  in
       one  special  directory.  The directory name is "errors", at the top of
       the web directory tree.  The error files should be named "errNNN.html",
       where  NNN is the error number.  So for example, to make a custom error
       page for the authentication failure error, which  is  number  401,  you
       would  put  your HTML into the file "errors/er.html".  If no custom
       error file is found for a given error number, then the  usual  built-in
       error page is generated.

       If  you’re  using the virtual hosts option, you can also have different
       custom error pages for each different virtual host.  In this  case  you
       put  another  "errors"  directory in the top of that virtual host’s web
       tree.  thttpd will look first in the virtual host errors directory, and
       then  in  the server-wide errors directory, and if neither of those has
       an appropriate error file then it will generate the built-in error.

NON-LOCAL REFERERS

       Sometimes another site on the net will embed your image files in  their
       HTML files, which basically means they’re stealing your bandwidth.  You
       can prevent them from doing this by using non-local referer  filtering.
       With  this  option,  certain  files  can  only  be  fetched via a local
       referer.  The files have to be referenced by a local web  page.   If  a
       web  page  on  some other site references the files, that fetch will be
       blocked.  There are three config-file variables for this feature:

       urlpat A wildcard pattern for the URLs  that  should  require  a  local
              referer.   This  is typically just image files, sound files, and
              so on.  For example:
                urlpat=**.jpg|**.gif|**.au|**.wav
              For most sites, that one setting  is  all  you  need  to  enable
              referer filtering.

       noemptyreferers
              By  default, requests with no referer at all, or a null referer,
              or a referer with no apparent hostname, are allowed.  With  this
              variable set, such requests are disallowed.

       localpat
              A wildcard pattern that specifies the local host or hosts.  This
              is used to determine if the host in the referer is local or not.
              If not specified it defaults to the actual local hostname.

TILDE EXPANSION

       thttpd  can be configured to expand base paths of the form /~user/ into
       proper references to the user’s web directory.

       If thttpd is compiled with the option TILDE_MAP_1 (this is the  out-of-
       the-box  default  when  you  build  from  source),  ~user  is mapped to
       prefix/user where the prefix might be something like /users.

       If thttpd is compiled with the option TILDE_MAP_2, ~user is mapped to a
       subdirectory  in  the user’s home directory; the subdirectory’s name is
       also configurable, but the default is public_html.  This arrangement is
       incompatible  with  the  chroot  option,  and  generally  considered  a
       security risk.

       See  also  the  next  section  about  symbolic  links  for   additional
       discussion.

SYMLINKS

       thttpd is very picky about symbolic links.  Before delivering any file,
       it first checks each element in the path to  see  if  it’s  a  symbolic
       link, and expands them all out to get the final actual filename.  Along
       the way it checks for things like links with ".."  that  go  above  the
       server’s  directory,  and absolute symlinks (ones that start with a /).
       These are prohibited as security holes, so the server returns an  error
       page  for  them.  This means you can’t set up your web directory with a
       bunch of symlinks pointing to individual users’ home  web  directories.
       Instead  you  do it the other way around - the user web directories are
       real subdirs of the main web directory, and in  each  user’s  home  dir
       there’s a symlink pointing to their actual web dir.

       The  CGI  pattern is also affected - it gets matched against the fully-
       expanded filename.  So, if you have a single CGI directory but then put
       a  symbolic  link  in it pointing somewhere else, that won’t work.  The
       CGI program will be treated as a  regular  file  and  returned  to  the
       client, instead of getting run.  This could be confusing.

PERMISSIONS

       thttpd  is  also  picky  about  file  permissions.  It wants data files
       (HTML, images) to be world readable.  Readable by the  group  that  the
       thttpd process runs as is not enough - thttpd checks explicitly for the
       world-readable bit.  This is so that no one ever gets  surprised  by  a
       file  that’s  not set world-readable and yet somehow is readable by the
       HTTP server and therefore the *whole* world.

       The same logic applies to directories.  As with the standard Unix  "ls"
       program,  thttpd  will only let you look at the contents of a directory
       if its read bit is on; but as with data files, this must be the  world-
       read bit, not just the group-read bit.

       thttpd  also  wants the execute bit to be *off* for data files.  A file
       that is marked executable but doesn’t match the CGI pattern might be  a
       script  or  program  that got accidentally left in the wrong directory.
       Allowing people to fetch the contents of the file might be  a  security
       breach,  so this is prohibited.  Of course if an executable file *does*
       match the CGI pattern, then it just gets run as a CGI.

       In summary, data files should  be  mode  644  (rw-r--r--),  directories
       should  be  755  (rwxr-xr-x)  if  you  want  to  allow indexing and 711
       (rwx--x--x) to disallow it, and CGI programs should be mode 755  (rwxr-
       xr-x) or 711 (rwx--x--x).

LOGS

       thttpd  does all of its logging via syslog(3).  The facility it uses is
       configurable.  Aside from error messages, there  are  only  a  few  log
       entry types of interest, all fairly similar to CERN Common Log Format:
         Aug  6 15:40:34 acme thttpd[583]: 165.113.207.103 - - "GET /file" 200 357
         Aug  6 15:40:43 acme thttpd[583]: 165.113.207.103 - - "HEAD /file" 200 0
         Aug  6 15:41:16 acme thttpd[583]: referer http://www.acme.com/ -> /dir
         Aug  6 15:41:16 acme thttpd[583]: user-agent Mozilla/1.1N
       The  package  includes  a script for translating these log entries info
       CERN-compatible files.  Note that thttpd does not translate numeric  IP
       addresses  into domain names.  This is both to save time and as a minor
       security measure (the numeric address is harder to spoof).

       Relevant config.h option: LOG_FACILITY.

       If you’d rather log directly to a file, you can use the -l command-line
       flag.  But note that error messages still go to syslog.

SIGNALS

       thttpd handles a couple of signals, which you can send via the standard
       Unix kill(1) command:

       INT,TERM
              These  signals  tell  thttpd  to  shut  down  immediately.   Any
              requests in progress get aborted.

       USR1   This  signal  tells  thttpd  to  shut  down as soon as it’s done
              servicing all current requests.  In addition, the network socket
              it uses to accept new connections gets closed immediately, which
              means a fresh thttpd can be started up immediately.

       USR2   This signal tells  thttpd  to  generate  the  statistics  syslog
              messages  immediately, instead of waiting for the regular hourly
              update.

       HUP    This signal tells thttpd to close and re-open  its  (non-syslog)
              log  file,  for  instance if you rotated the logs and want it to
              start using the new one.  This is a  little  tricky  to  set  up
              correctly,  for  instance if you are using chroot() then the log
              file must be within the chroot tree, but it’s definitely doable.

THANKS

       Many  thanks  to contributors, reviewers, testers: John LoVerso, Jordan
       Hayes, Chris Torek, Jim Thompson, Barton  Schaffer,  Geoff  Adams,  Dan
       Kegel,  John  Hascall, Bennett Todd, KIKUCHI Takahiro, Catalin Ionescu.
       Special  thanks  to  Craig  Leres   for   substantial   debugging   and
       development, and for not complaining about my coding style very much.

AUTHOR

       Copyright  ©  1995,1998,1999,2000 by Jef Poskanzer <jef@mail.acme.com>.
       All rights reserved.

                               29 February 2000                      thttpd(8)

NAME

SYNOPSIS

DESCRIPTION

OPTIONS

CONFIG-FILE

CHROOT

CGI

BASIC AUTHENTICATION

THROTTLING

MULTIHOMING

CUSTOM ERRORS

NON-LOCAL REFERERS

TILDE EXPANSION

SYMLINKS

PERMISSIONS

LOGS

SIGNALS

SEE ALSO

THANKS

AUTHOR