urlwatch - Watch web pages and arbitrary URLs for changes

NAME

       urlwatch - Watch web pages and arbitrary URLs for changes

SYNOPSIS

       urlwatch [options]

DESCRIPTION

       urlwatch  watches  a  list  of  URLs for changes and prints out unified
       diffs of the changes. You can filter always-changing parts of  websites
       by providing a "hooks.py" script.

OPTIONS

       --version
              show program’s version number and exit

       -h, --help
              show the help message and exit

       -v, --verbose
              Show debug/log output

       --urls=FILE
              Read URLs from the specified file

       --hooks=FILE
              Use specified file as hooks.py module

       -e, --display-errors
              Include HTTP errors (404, etc..) in the output

ADVANCED FEATURES

       urlwatch  includes  some advanced features that you have to activate by
       creating a hooks.py file  that  specifies  for  which  URLs  to  use  a
       specific  feature.  You  can  also  use  the  hooks.py  file  to filter
       trivially-varying elements of a web page.

   ICALENDAR FILE PARSING
       This module allows you to parse .ics files that are in iCalendar format
       and  provide  a very simplified text-based format for the diffs. Use it
       like this in your hooks.py file:

         from urlwatch import ical2txt

         def filter(url, data):
             if url.endswith(’.ics’):
                 return ical2txt.ical2text(data).encode(’utf-8’) + data
             # ...you can add more hooks here...

   HTML TO TEXT CONVERSION
       There are three methods of converting  HTML  to  text  in  the  current
       version of urlwatch: "lynx" (default), "html2text" and "re". The former
       two use command-line utilities of the same  name  to  convert  HTML  to
       text,  and  the last one uses a simple regex-based tag stripping method
       (needs no extra tools).  Here  is  an  example  of  using  it  in  your
       hooks.py file:

         from urlwatch import html2txt

         def filter(url, data):
             if url.endswith(’.html’) or url.endswith(’.htm’):
                 return html2txt.html2text(data, method=’lynx’)
             # ...you can add more hooks here...

FILES

       ~/.urlwatch/urls.txt
              A list of HTTP/FTP URLs to watch (one URL per line)

       ~/.urlwatch/lib/hooks.py
              A Python module that can be used to filter contents

       ~/.urlwatch/cache/
              The state of web pages is saved in this folder

AUTHOR

       Thomas Perl <thp@thpinfo.com>

WEBSITE

       http://thpinfo.com/2008/urlwatch/