NAME
urlwatch - Watch web pages and arbitrary URLs for changes
SYNOPSIS
urlwatch [options]
DESCRIPTION
urlwatch watches a list of URLs for changes and prints out unified
diffs of the changes. You can filter always-changing parts of websites
by providing a "hooks.py" script.
OPTIONS
--version
show program’s version number and exit
-h, --help
show the help message and exit
-v, --verbose
Show debug/log output
--urls=FILE
Read URLs from the specified file
--hooks=FILE
Use specified file as hooks.py module
-e, --display-errors
Include HTTP errors (404, etc..) in the output
ADVANCED FEATURES
urlwatch includes some advanced features that you have to activate by
creating a hooks.py file that specifies for which URLs to use a
specific feature. You can also use the hooks.py file to filter
trivially-varying elements of a web page.
ICALENDAR FILE PARSING
This module allows you to parse .ics files that are in iCalendar format
and provide a very simplified text-based format for the diffs. Use it
like this in your hooks.py file:
from urlwatch import ical2txt
def filter(url, data):
if url.endswith(’.ics’):
return ical2txt.ical2text(data).encode(’utf-8’) + data
# ...you can add more hooks here...
HTML TO TEXT CONVERSION
There are three methods of converting HTML to text in the current
version of urlwatch: "lynx" (default), "html2text" and "re". The former
two use command-line utilities of the same name to convert HTML to
text, and the last one uses a simple regex-based tag stripping method
(needs no extra tools). Here is an example of using it in your
hooks.py file:
from urlwatch import html2txt
def filter(url, data):
if url.endswith(’.html’) or url.endswith(’.htm’):
return html2txt.html2text(data, method=’lynx’)
# ...you can add more hooks here...
FILES
~/.urlwatch/urls.txt
A list of HTTP/FTP URLs to watch (one URL per line)
~/.urlwatch/lib/hooks.py
A Python module that can be used to filter contents
~/.urlwatch/cache/
The state of web pages is saved in this folder
AUTHOR
Thomas Perl <thp@thpinfo.com>
WEBSITE
http://thpinfo.com/2008/urlwatch/