NAME
code2html - Converts a program source code to HTML
SYNOPSIS
(1) code2html [options] [input-file [output-file]]
(2) code2html -p [file [alternate-outfile]]
(3) code2html (as a CGI script; see the section on CGI)
DESCRIPTION
code2html is a perl script which converts a program source code to
syntax highlighted HTML, or any other format for wich rules are
defined.
(1) OPTIONS
input-file
Is the file which contains the program source code to be
formatted. If not specified or a minus (-) is given, the code
will be read from STDIN.
output-file
Is the file to write the formatted code to. If not specified
or a minus (-) is given, the code will be written to STDOUT.
-l, --language-mode
Specify the set of regular expressions to use. These have to
be defined in a language file (see FILES below). To find out
which language modes are defined, issue a code2html --modes.
This input is treated case-insensitive.
If not given, some heuristics will be used to determine the
file language.
-v, --verbose
Prints progress information to STDERR.
-n, --linenumbers
Print out the source code with line numbers.
-N, --linknumbers
Print out the source code with line numbers. The linenumbers
will link to themselves, which makes it easy to send links to
lines.
-P, --prefix
Optional prefix to use for line number anchors.
-t, --replace-tabs[=TABSTOP-WIDTH]
Replace each occurence of a <TAB> character with the right
amount of spaces to get to the next tabstop. Default is a
tabstop width of 8 characters.
-L, --language-file=LANGUAGE-FILE
Specify an alternate file to take the language and output-
format definitions from (see the section on FILES below).
-m, --modes
Print all language modes and output-formats currently defined
to STDOUT and exit succesfully. Also prints modes from a
LANGUAGE-FILE given by --language-file if applicable.
--fallback=LANG
If the language mode given with --language-mode cannot be found
then use this mode.
--fallback plain for instance is usefull when code2html is
called from a script to ensure output is created.
-h, --help
Print a short help and exit succesfully.
-V, --version
Print the program version and exit succesfully.
-c, --content-type
Prints ”Content-Type: text/html\n\n“ (or whatever the output-
format defines as a content-type) prior to the rest of the
output. Usefull if the script is ivoked as a cgi script.
-o, --output-format
Selects the output-format. html is the default. To find out
which outputformats are defined, issue a code2html --modes.
-H, --no-header
do not make use of the template defined by the output-format.
For HTML this means that there will be no <html>, <head>, and
no <typical for patch and CGI modes,pre> tags.
--template=FILE
overrides the default template for the given output format. If
--no-header is given too, this has no meaning, since the
template is ignored anyway.
-T, --title
Set the title of the produced output file. Only works if the
template does support setting the title.
-w, --linewidth=LINEWIDTH
Wrap lines after LINEWIDTH characters. Default is to not wrap
lines at all.
-b, --linebreakprefix=LINEPREFIX
Use fILINEPREFIX at the start of wrapped lines. Default is "»
".
(2) HTML patching
code2html -p [file [alternate-outfile]]
code2html also allows you to have inline source code in an html file.
It can then take this html file and insert the syntax highlighted code.
If no file is given, code2html reads from STDIN and writes to STDOUT.
If just one file is given it replaces this file with the output. If
two files are provided, the first one is read from and the second one
written to.
To use this feature, just insert a like like this into your html file:
<!-- code2html add [options] <file> -->
the syntax highlighted file will be inserted at this position enclosed
in <pre> tags.
All options that can be given on the command line like --linenumbers
etc. work. --help, --version, etc. work too however it is not very
intelligent to use them :). Using --output-format to choose a non-HTML
outputformat is not adviseable. --content-type is ignored.
You may also write the program’s source code directly in the html file
with the following syntax:
<!-- code2html add [options]
<your program source code here>
-->
It is usually a good idea to at least give the --language-mode option
to specify the language.
(3) CGI
If the the script is used as a CGI script (GATEWAY_INTERFACE
environment set and no command line arguments given) code2html reads
the arguments either from the query string or from SDTIN. (methods
POST and GET).
--content-type is switched on automatically and the output always goes
to STDOUT.
The following parameters/options are accepted:
language-mode - optional
‘c’, ‘cc’, ‘pas’, etc.
if not given, some heuristics are used to find out the
language.
fallback - optional
‘plain’, ‘c’, etc. if language-mode cannot be found, use this
one
input-selector - optional
either ‘file’, ‘cgi-input1’, ‘cgi-input2’, or ‘REDIRECT_URL’
default: file
filename
file to read from if input-selector is ‘file’
cgi-input1
The source code to syntax highlight. For example from a
<textarea> or from a upload. See input-selector.
cgi-input2
The source code to syntax highlight. For example from a
<textarea> or from a upload. See input-selector.
line-numbers - optional
‘yes’, ‘no’ or ‘link’
default: no
replace-tabs - optional
If 0 then tabs are not replaced, else replace each occurence of
a <TAB> character with the right amount of spaces to get to the
next tabstop.
default: 0
title - optional
Set’s the title of the file.
no-encoding - optional
By default code2html tries to encode the output as either
bz2/gz/Z if the client supports this (HTTP_ACCEPT_ENCODING) and
the needed program is available on the server. You may need to
modify @CGI_ENCODING in the script to match your program
locations.
If no-encoding is defined as “true” code2html does not try to
encode the output.
Why two cgi-inputs you may ask: This is to allow your users to choose
vie a <form> interface whether they want to insert their file into a
<textarea> or user a <browse> button to select their file. See the
example on my home- page.
Note that if $FILES_DISALLOWED_IN_CGI is 0 it is possbile for your
users to read all the files the httpd can read (if you don’t run a cgi-
wrapper or something like this. By default this value is set to 1, so
file reading via cgi should not be allowed. You can allow it with
setting $FILES_DISALLOWED_IN_CGI to 0 at the top of the script.
The input selector REDIRECT_URL needs a special explaination. The file
name is formed from the two enviroment variables DOCUMENT_ROOT and
REDIRECT_URL.
If you want apache to automatically call code2html for all program
source code files you may do this by adding these two lines to your
srm.conf:
AddHandler text/x-sourcecode .c .cc .cpp .pas .h .p
Action text/x-sourcecode /cgi-bin/code2html?input-
selector=REDIRECT_URL&foo=
or something similar to this. In the AddHandle line you can choose
which extensions to pass through code2html.
WARNING: Do not add .pl to this line and name this script
“code2html.pl”. This will result in a loop.
Also make sure that you load the Action module (srm.conf).
Replace /cgi-bin/code2html with the virtual location under which the
file can be accessed. Note the “foo=” part. Apache appends the URL of
the file to display at the end of the action part. We do not need this
since we use the environment variable REDIRECT_URL however we do not
want to get the url addes to the input-selector string. Therefore we
append the “&foo=” part.
Tnx to Kevin Burton <burton@relativity.yi.org> for the idea. He also
states that
> It is more powerfull if you use it in an Apache
> <Directory> tag
>
> <Directory /source>
>
> #with your Action tag here... this way you can
> #still have regular .java files on your server.
>
> </Directory>
>
EXAMPLE
assuming code2html is in the current directory, you may type
code2html -l perl code2html.pl code2html.html
to convert the script into a html file.
FILES
Code2html looks for it’s configuration in several places.
· the file specified by -L or --language-file if any
· the files specified in the evironment variable CODE2HTML_CONFIG,
seperated by colons
· user’s $HOME/.code2html.config
· /etc/code2html.config
· built in default languages
Entries in a file that is mentioned earlier in this list override rules
from later files.
The file structure must be valid perl code.
The global variables %LANGUAGE and %STYLESHEET are already defined, so
you should not redeclare them using “my”.
When you are looking for a model configuration to serve as a basis for
your own configuration file, it is probably best to start out by
checking the built-in definitions at the bottom of code2html.
If your pattern includes back references like a lot patterns do in perl
for example, then you have to use \2 instead of \1, \3 instead of \2
and so on. I really don’t like this hack but it is a lot faster.
Example:
<<([^\n]*).*?^\2$
In this example the perl << stuff is matched, i.e. everything from a <<
until a line that consists of exactly the same string as behind the <<
was. The \2 references the matched chars in the parentenses.
If you ever write language specific rule files yourself, I’d be
grateful if you could send those to me, so I could make them available
(with full credits of course) on my homepage for anyone to grab,
whenever some of those files suit someone else’s needs. Before you do
so you might also have a look at my site to check wheter someone has
already written a rule file for your favourite language.
NOTES
The language recognition mechanism relies on specific patterns within
the file name and the content of the processed file, such as file name
extensions and shebangs (#!). This means that if the input is a pipe
or a socket, the file name does not follow traditional naming
conventions, or the content of the processed file is incomplete, the
input language name should be specified using the --language-mode
command line parameter.
BUGS
Please report bugs to code2html@palfrader.org. This program is still a
beta release, so you should expect to find some.
Also have a look at my web-site, perhaps a new version is available
already at http://www.palfrader.org/code2html/.
AUTHOR
Peter Palfrader, <code2html@palfrader.org> A lot of other people. See
contributers in the file itself.
LICENSE
Copyright (c) 1999, 2000 by Peter Palfrader & others.
Permission is hereby granted, free of charge, to any person obtaining a
copy of this software and associated documentation files (the
“Software”, to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:
The above copyright notice and this permission notice shall be included
in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS
OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.