NAME
hadoop - software platform to process vast amounts of data
SYNOPSIS
Usage: hadoop [--config confdir] COMMAND
DESCRIPTION
Hereâ.INDENT 0.0
Scalable
Hadoop can reliably store and process petabytes.
Economical
It distributes the data and processing across clusters of
commonly available computers. These clusters can number into the
thousands of nodes.
Efficient
By distributing the data, Hadoop can process it in parallel on
the nodes where the data is located. This makes it extremely
rapid.
Reliable
Hadoop automatically maintains multiple copies of data and
automatâically redeploys computing tasks based on failures.
(see figure below.) MapReduce divides applications into many small blocks of
http://wiki.apache.org/hadoop/.
OPTIONS
--config configdir
Overrides the "HADOOP_CONF_DIR" environment variable. See
"ENVIâRONMENT" section below.
COMMANDS
namenode -format
format the DFS filesystem
secondarynamenode
run the DFS secondary namenode
namenode
run the DFS namenode
datanode
run a DFS datanode
dfsadmin
run a DFS admin client
fsck run a DFS filesystem checking utility
fs run a generic filesystem user client
balancer
run a cluster balancing utility
jobtracker
run the MapReduce job Tracker node
pipes run a Pipes job
tasktracker
run a MapReduce task Tracker node
job manipulate MapReduce jobs
version
print the version
jar <jar>
run a jar file
distcp <srcurl> <desturl>
copy file or directories recursively
archive -archiveName NAME <src>* <dest>
create a hadoop archive
daemonlog
get/set the log level for each daemon
CLASSNAME
run the class named CLASSNAME
Most commands print help when invoked w/o parameters.
FILESYSTEM COMMANDS
The following commands can be used with the fs command like hadoop fs
[filesystem command]
· -ls <path>
· -lsr <path>
· -du <path>
· -dus <path>
· -count[-q] <path>
· -mv <src> <dst>
· -cp <src> <dst>
· -rm [-skipTrash] <path>
· -rmr [-skipTrash] <path>
· -expunge
· -put <localsrc> ... <dst>
· -copyFromLocal <localsrc> ... <dst>
· -moveFromLocal <localsrc> ... <dst>
· -get [-ignoreCrc] [-crc] <src> <localdst>
· -getmerge <src> <localdst> [addnl]
· -cat <src>
· -text <src>
· -copyToLocal [-ignoreCrc] [-crc] <src> <localdst>
· -moveToLocal [-crc] <src> <localdst>
· -mkdir <path>
· -setrep [-R] [-w] <rep> <path/file>
· -touchz <path>
· -test -[ezd] <path>
· -text <src>
· -copyToLocal [-ignoreCrc] [-crc] <src> <localdst>
· -moveToLocal [-crc] <src> <localdst>
· -mkdir <path>
· -setrep [-R] [-w] <rep> <path/file>
· -touchz <path>
· -test -[ezd] <path>
· -stat [format] <path>
· -tail [-f] <file>
· -chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...
· -chown [-R] [OWNER][:[GROUP]] PATH...
· -chgrp [-R] GROUP PATH...
· -help [cmd]
Generic options supported are
-conf <configuration file>
specify an application configuration file
-D <property=value>
use value for given property
-fs <local|namenode:port>
specify a namenode
-jt <local|jobtracker:port>
specify a job tracker
-files <comma separated list of files>
specify comma separated files to be copied to the map reduce
cluster
-libjars <comma separated list of jars>
specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives>
specify comma separated archives to be unarchived on the compute
machines.
FILES
/etc/hadoop/conf
This symbolic link points to the currently active Hadoop
configuraâtion directory.
Note to Hadoop System Admins
The "/etc/hadoop/conf" link is managed by the alternaâtives(8) command
so you should not change this symlink directly.
To see what current alternative(8) Hadoop configurations you have, run
the following command:
# alternatives --display hadoop
hadoop - status is auto.
link currently points to /etc/hadoop/conf.pseudo
/etc/hadoop/conf.empty - priority 10
/etc/hadoop/conf.pseudo - priority 30
Current 'best' version is /etc/hadoop/conf.pseudo.
This shows that the link point to "/etc/hadoop/conf.pseudo" (for the
Hadoop Pseudo-Distributed configuration).
To add a new custom configuration, run the following comâmands as root:
# cp -r /etc/hadoop/conf.empty /etc/hadoop/conf.my
This will create a new configuration directory, "/etc/hadoop/conf.my",
that serves as a starting point for a new configuration. Edit the
configuration files in "/etc/hadoop/conf.my" until you have the
configuration you want.
To activate your new configuration and see the new configuâration list:
# alternatives --install /etc/hadoop/conf hadoop /etc/hadoop/conf.my 90
You can verify your new configuration is active by running the
following:
# alternatives --display hadoop
hadoop - status is auto.
link currently points to /etc/hadoop/conf.my
/etc/hadoop/conf.empty - priority 10
/etc/hadoop/conf.pseudo - priority 30
/etc/hadoop/conf.my - priority 90
Current 'best' version is /etc/hadoop/conf.my.
At this point, it might be a good idea to restart your serâvices with
the new configuration, e.g.,
# /etc/init.d/hadoop-namenode restart
/etc/hadoop/conf/hadoop-site.xml
This is the path to the currently deployed Hadoop site
configuraâtion. See "/etc/hadoop/conf" above.
/usr/bin/hadoop-config.sh
This script searches for a useable "JAVA_HOME" location if
"JAVA_HOME" is not already set. It also sets up environment
variâables that Hadoop components need at startup (see
"ENVIRONMENT" section).
/etc/init.d/hadoop-namenode
Service script for starting and stopping the Hadoop NameNode
/etc/init.d/hadoop-datanode
Service script for starting and stopping the Hadoop DataNode
/etc/init.d/hadoop-secondarynamenode
Service script for starting and stopping the Hadoop Secondary
NameNode
/etc/init.d/hadoop-jobtracker
Service script for starting and stopping the Hadoop JobTracker
/etc/init.d/hadoop-tasktracker
Service script for starting and stopping the Hadoop TaskTracker
ENVIRONMENT
HADOOP_CONF_DIR
The location of the Hadoop configuration files. Defaults to
"/etc/hadoop/conf". For more details, see the "FILES" section.
HADOOP_LOG_DIR
All Hadoop services log to "/var/log/hadoop" by default. You
can change the location with this environment variable.
HADOOP_ROOT_LOGGER
Setting for log4j. Defaults to ERROR,console. You can try
INFO,console for more verbose output.
EXAMPLES
$ mkdir input
$ cp <txt files> input
$ hadoop jar /usr/lib/hadoop/*example*.jar input output 'grep string'
$ cat output/*
BUGS
The Debian package of Hadoop is still in beta state. Use it at your own
risk!
SEE ALSO
alternatives(8)
AUTHOR
Cloudera, Thomas Koch <thomas.koch@ymc.ch>
COPYRIGHT
2008 The Apache Software Foundation. All rights reserved.