pdsh - issue commands to groups of hosts in parallel

NAME

       pdsh - issue commands to groups of hosts in parallel

SYNOPSIS

       pdsh [options]... command

DESCRIPTION

       pdsh  is  a  variant  of  the rsh(1) command. Unlike rsh(1), which runs
       commands on a single remote host, pdsh can run multiple remote commands
       in  parallel.  pdsh  uses  a "sliding window" (or fanout) of threads to
       conserve  resources  on  the  initiating  host  while   allowing   some
       connections to time out.

       When  pdsh  receives  SIGINT  (ctrl-C),  it lists the status of current
       threads. A second SIGINT within  one  second  terminates  the  program.
       Pending  threads may be canceled by issuing ctrl-Z within one second of
       ctrl-C.  Pending threads are those that have not yet been initiated, or
       are still in the process of connecting to the remote host.

       If  a  remote  command  is not specified on the command line, pdsh runs
       interactively,  prompting  for  commands  and   executing   them   when
       terminated  with  a  carriage return. In interactive mode, target nodes
       that time out on the first command are  not  contacted  for  subsequent
       commands,  and  commands  prefixed  with  an  exclamation point will be
       executed on the local system.

       The core functionality of  pdsh  may  be  supplemented  by  dynamically
       loadable  modules.  The  modules  may provide a new connection protocol
       (replacing the standard rcmd(3) protocol  used  by  rsh(1)),  filtering
       options  (e.g.  removing  hosts  that are "down" from the target list),
       and/or host selection options  (e.g.,  -a  selects  all  hosts  from  a
       configuration  file.).  By  default, pdsh must have at least one "rcmd"
       module loaded. See the RCMD MODULES section for more information.

RCMD MODULES

       The method by which pdsh runs commands on remote hosts may be  selected
       at runtime using the -R option (See OPTIONS below).  This functionality
       is ultimately implemented via dynamically loadable modules, and so  the
       list  of  available  options  may  be  different  from  installation to
       installation. A list of currently available  rcmd  modules  is  printed
       when  using  any  of the -h, -V, or -L options. The default rcmd module
       will also be displayed with the -h and -V options.

       A list of rcmd modules currently distributed with pdsh follows.

       rsh     Uses an internal, thread-safe implementation of BSD rcmd(3)  to
               run commands using the standard rsh(1) protocol.

       exec    Executes  an  arbitrary command for each target host. The first
               of the pdsh remote arguments is the local command  to  execute,
               followed  by  any further arguments. Some simple parameters are
               substitued on the command line, including  %h  for  the  target
               hostname,  %u  for  the  remote username, and %n for the remote
               rank [0-n] (To get a literal  %  use  %%).   For  example,  the
               following   would   duplicate  using  the  ssh  module  to  run
               hostname(1) across the hosts foo[0-10]:

                  pdsh -R exec -w foo[0-10] ssh -x -l %u %h hostname

               and this command line would run grep(1) in parallel across  the
               files console.foo[0-10]:

                  pdsh -R exec -w foo[0-10] grep BUG console.%h

       ssh     Uses a variant of popen(3) to run multiple copies of the ssh(1)
               command.

       mrsh    This module uses the mrsh(1) protocol to execute jobs on remote
               hosts.    The   mrsh   protocol   uses   a   credential   based
               authentication, forgoing the need to allocate  reserved  ports.
               In  other  aspects, it acts just like rsh. Remote nodes must be
               running mrshd(8) in order for the mrsh module to work.

       qsh     Allows pdsh to execute MPI jobs over QsNet.  Qshell  propagates
               the  current  working  directory,  pdsh  environment,  and Elan
               capabilities to the remote process. The  following  environment
               variable  are  also  appended  to  the  environment:  RMS_RANK,
               RMS_NODEID, RMS_PROCID, RMS_NNODES, and RMS_NPROCS. Since  pdsh
               needs  to  run  setuid root for qshell support, qshell does not
               directly support propagation of LD_LIBRARY_PATH and LD_PREOPEN.
               Instead       the       QSHELL_REMOTE_LD_LIBRARY_PATH       and
               QSHELL_REMOTE_LD_PREOPEN environment variables will may be used
               and  will  be remapped to LD_LIBRARY_PATH and LD_PREOPEN by the
               qshell daemon if set.

       mqsh    Similar to qshell, but uses the mrsh protocol  instead  of  the
               rsh protocol.

       krb4    The  krb4  module allows users to execute remote commands after
               authenticating  with  kerberos.  Of  course,  the  remote  rshd
               daemons must be kerberized.

       xcpu    The  xcpu  module  uses  the  xcpu  service  to  execute remote
               commands.

OPTIONS

       The list of available options is determined at runtime by supplementing
       the  list  of standard pdsh options with any options provided by loaded
       rcmd and misc modules.  In some cases, options provided by modules  may
       conflict  with each other. In these cases, the modules are incompatible
       and the first module loaded wins.

Standard target nodelist options

       -w [rcmd_type:][user@]host,host,...
              Target the specified list of hosts. Do not use  with  any  other
              node  selection  options (e.g. -a, -g if they are available). No
              spaces  are  allowed  in  the  comma-separated  list.   A   list
              consisting  of a single ‘-’ character causes the target hosts to
              be read from stdin, one per line.  The  host  list  may  contain
              hostlist  expressions  of  the  form  ‘‘host[1-5,7]’’.  For more
              information  about  the  hostlist  format,  see   the   HOSTLIST
              EXPRESSIONS  section below. A list of hosts may also be preceded
              by "user@" to specify a remote username other than the  default,
              or "rcmd_type:" to specify an alternate rcmd connection type for
              these hosts. When used together, the rcmd type must be specified
              first,  e.g. "ssh:user1@host0" would use ssh to connect to host0
              as user "user1."

       -x host,host,...
              Exclude the specified hosts. May  be  specified  in  conjunction
              with  other  target  node  list  options such as -a and -g (when
              available). Hostlists may also be specified  to  the  -x  option
              (see the HOSTLIST EXPRESSIONS section below).

Standard pdsh options

       -S     Return the largest of the remote command return values.

       -h     Output  usage  menu  and  quit. A list of available rcmd modules
              will also be printed at the end of the usage message.

       -s     Only on AIX, separate remote command stderr and stdout into  two
              sockets.

       -q     List  option  values  and  the  target nodelist and exit without
              action.

       -b     Disable ctrl-C status feature so  that  a  single  ctrl-C  kills
              parallel job. (Batch Mode)

       -l user
              This  option may be used to run remote commands as another user,
              subject to authorization. For BSD rcmd, this means the  invoking
              user  and system must be listed in the user´s .rhosts file (even
              for root).

       -t seconds
              Set the connect timeout. Default is 10 seconds.

       -u seconds
              Set a limit on the amount of time a remote command is allowed to
              execute.   Default is no limit. See note in LIMITATIONS if using
              -u with ssh.

       -f number
              Set the  maximum  number  of  simultaneous  remote  commands  to
              number.  The default is 32.

       -R name
              Set  rcmd  module  to  name. This option may also be set via the
              PDSH_RCMD_TYPE environment variable. A list  of  available  rcmd
              modules  may  be  obtained  via  the -h, -V, or -L options.  The
              default will be listed with -h or -V.

       -L     List info on all loaded pdsh modules and quit.

       -N     Disable hostname: prefix on lines of output.

       -d     Include more complete thread status when SIGINT is received, and
              display connect and command time statistics on stderr when done.

       -V     Output pdsh version information, along with  list  of  currently
              loaded modules, and exit.

qsh/mqsh module options

       -n tasks_per_node
              Set the number of tasks spawned per node. Default is 1.

       -m block | cyclic
              Set  block  versus  cyclic  allocation  of  processes  to nodes.
              Default is block.

       -r railmask
              Set the rail bitmask for  a  job  on  a  multirail  system.  The
              default  railmask  is  1, which corresponds to rail 0 only. Each
              bit set in the argument to -r  corresponds  to  a  rail  on  the
              system,  so  a value of 2 would correspond to rail 1 only, and 3
              would indicate to use both rail 1 and rail 0.

machines module options

       -a     Target all nodes from machines file.

genders module options

       In addition  to  the  genders  options  presented  below,  the  genders
       attribute  pdsh_rcmd_type  may  also be used in the genders database to
       specify an alternate rcmd connect type than the pdsh default for  hosts
       with  this  attribute.  For  example, the following line in the genders
       file

         host0 pdsh_rcmd_type=ssh

       would cause pdsh to use ssh to connect to host0, even if rsh  were  the
       default.    This   can  be  overridden  on  the  commandline  with  the
       "rcmd_type:host0" syntax.

       -A     Target all nodes in genders database. The -A option will  target
              every  host  listed in genders -- if you want to omit some hosts
              by default, see the -a option below.

       -a     Target all nodes in  genders  database  except  those  with  the
              "pdsh_all_skip"  attribute.  This is shorthand for running "pdsh
              -A -X pdsh_all_skip ..."

       -g attr[=val][,attr[=val],...]
              Target nodes that match any of the specified genders  attributes
              (with  optional  values). Conflicts with -a and -w options. This
              option targets the alternate hostnames in the  genders  database
              by  default. The -i option provided by the genders module may be
              used to translate these to the canonical genders  hostnames.  If
              the   installed  version  of  genders  supports  it,  attributes
              supplied to -g may  also  take  the  form  of  genders  queries.
              Genders  queries  will query the genders database for the union,
              intersection, difference, or complement  of  genders  attributes
              and  values.  The set operation union is represented by two pipe
              symbols (’||’), intersection by two  ampersand  symbols  (’&&’),
              difference  by  two  minus  symbols  (’--’), and complement by a
              tilde (’~’).  Parentheses may be used to  change  the  order  of
              operations.  See the nodeattr(1) manpage for examples of genders
              queries.

       -X attr[=val][,attr[=val],...]
              Exclude nodes that match any of the specified genders attributes
              (optionally   with   values).    This  option  may  be  used  in
              combination with any other of the node selection  options  (e.g.
              -w, -g, -a, -X may also take the form of genders queries. Please
              see documentation for the genders -g option for more information
              about genders queries.

       -i     Request translation between canonical and alternate hostnames.

       -F filename
              Read  genders  information  from  filename instead of the system
              default genders file.

nodeupdown module options

       -v     Eliminate  target  nodes   that   are   considered   "down"   by
              libnodeupdown.

slurm module options

       The slurm module allows pdsh to target nodes based on currently running
       SLURM jobs. The slurm module is typically called after all  other  node
       selection  options  have  been  processed,  and  if  no nodes have been
       selected, the module will attempt to read  a  running  jobid  from  the
       SLURM_JOBID  environment  variable  (which  is set when running under a
       SLURM allocation). If SLURM_JOBID references an invalid job, it will be
       silently ignored.

       -j jobid[,jobid,...]
              Target  list  of  nodes  allocated  to the SLURM job jobid. This
              option may be used multiple times to target multiple SLURM jobs.
              The  special  argument  "all"  can  be  used to target all nodes
              running SLURM jobs, e.g.  -j all.

rms module options

       The rms module allows pdsh to target nodes based on  an  RMS  resource.
       The  rms  module  is  typically  called  after all other node selection
       options, and if no nodes have been selected, the  module  will  examine
       the  RMS_RESOURCEID  environment variable and attempt to set the target
       list of hosts to the nodes in the RMS resource. If an invalid  resource
       is denoted, the variable is silently ignored.

SDR module options

       The  SDR module supports targeting hosts via the System Data Repository
       on IBM SPs.

       -a     Target all nodes in the SDR. The  list  is  generated  from  the
              "reliable hostname" in the SDR by default.

       -i     Translate  hostnames  between  reliable  and initial in the SDR,
              when applicable.  If the a target hostname  matches  either  the
              initial or reliable hostname in the SDR, the alternate name will
              be substitued. Thus a list composed of  initial  hostnames  will
              instead  be  replaced  with  a  list of reliable hostnames.  For
              example, when used with -a above, all initial hostnames  in  the
              SDR are targeted.

       -v     Do not target nodes that are marked as not responding in the SDR
              on the targeted interface. (If a hostname does not appear in the
              SDR, then that name will remain in the target hostlist.)

       -G     In combination with -a, include all partitions.

nodeattr module options

       The  nodeattr  module  supports  access to the genders database via the
       nodeattr(1) command. See the  genders  section  above  for  a  list  of
       support  options  with  this module. The option usage with the nodeattr
       module is the same as genders, above, with the exception  that  the  -i
       option may only be used with -a or -g. NOTE: This module will only work
       with very  old  releases  of  genders  where  the  nodeattr(1)  command
       supports  the  -r  option, and before the libgenders API was available.
       Users running newer versions of genders will need to  use  the  genders
       module instead.

dshgroup module options

       The  dshgroup  module  allows pdsh to use dsh (or Dancer’s shell) style
       group files from /etc/dsh/group/ or ~/.dsh/group/.

       -g groupname,...
              Target nodes in dsh  group  file  "groupname"  found  in  either
              ~/.dsh/group/groupname or /etc/dsh/group/groupname.

       -X groupname,...
              Exclude nodes in dsh group file "groupname."

netgroup module options

       The  netgroup  module  allows  pdsh to use standard netgroup entries to
       build lists of target hosts. (/etc/netgroup or NIS)

       -g groupname,...
              Target nodes in netgroup "groupname."

       -X groupname,...
              Exclude nodes in netgroup "groupname."

ENVIRONMENT VARIABLES

       PDSH_RCMD_TYPE
              Equivalent to the -R  option,  the  value  of  this  environment
              variable will be used to set the default rcmd module for pdsh to
              use (e.g. ssh, rsh).

       PDSH_SSH_ARGS
              Override the standard arguments that pdsh passes to  the  ssh(1)
              command ("-2 -a -x").

       PDSH_SSH_ARGS_APPEND
              Append additional options to the ssh(1) command invoked by pdsh.
              For example, PDSH_SSH_ARGS_APPEND="-q" would run  ssh  in  quiet
              mode, or "-v" would increase the verbosity of ssh.

       WCOLL  If no other node selection option is used, the WCOLL environment
              variable may be set to a filename from which a  list  of  target
              hosts will be read. The file should contain a list of hosts, one
              per line (though each line may contain  a  hostlist  expression.
              See HOSTLIST EXPRESSIONS section below).

       DSHPATH
              If  set,  the  path  in DSHPATH will be used as the PATH for the
              remote processes.

       FANOUT Set the pdsh fanout (See description of -f above).

HOSTLIST EXPRESSIONS

       As noted in sections above pdsh accepts  lists  of  hosts  the  general
       form:  prefix[n-m,l-k,...],  where  n  <  m  and  l  <  k,  etc., as an
       alternative to explicit  lists  of  hosts.  This  form  should  not  be
       confused  with  regular  expression  character classes (also denoted by
       ‘‘[]’’). For example, foo[19] does not represent an expression matching
       foo1 or foo9, but rather represents the degenerate hostlist: foo19.

       The  hostlist  syntax is meant only as a convenience on clusters with a
       "prefixNNN" naming convention and specification of ranges should not be
       considered  necessary  -- this foo1,foo9 could be specified as such, or
       by the hostlist foo[1,9].

       Some examples of usage follow:

       Run command on foo01,foo02,...,foo05
           pdsh -w foo[01-05] command

       Run command on foo7,foo9,foo10
            pdsh -w foo[7,9-10] command

       Run command on foo0,foo4,foo5
            pdsh -w foo[0-5] -x foo[1-3] command

       A suffix on the hostname is also supported:

       Run command on foo0-eth0,foo1-eth0,foo2-eth0,foo3-eth0
          pdsh -w foo[0-3]-eth0 command

       As a reminder to the reader, some shells will interpret  brackets  (’[’
       and  ’]’)  for  pattern  matching.   Depending on your shell, it may be
       necessary to enclose ranged lists within quotes.  For example, in tcsh,
       the first example above should be executed as:

            pdsh -w "foo[01-05]" command

ORIGIN

       Originally a rewrite of IBM dsh(1) by Jim Garlick <garlick@llnl.gov> on
       LLNL’s ASCI Blue-Pacific IBM  SP  system.  It  is  now  used  on  Linux
       clusters at LLNL.

LIMITATIONS

       When  using  ssh  for  remote execution, expect the stderr of ssh to be
       folded in with that of the remote command. When invoked by pdsh, it  is
       not  possible  for  ssh  to  prompt  for  passwords if RSA/DSA keys are
       configured properly, etc..  For ssh  implementations  that  suppport  a
       connect timeout option, pdsh attempts to use that option to enforce the
       timeout  (e.g.  -oConnectTimeout=T  for  OpenSSH),  otherwise   connect
       timeouts  are  not  supported  when  using  ssh.   Finally, there is no
       reliable way for pdsh to  ensure  that  remote  commands  are  actually
       terminated  when  using  a command timeout. Thus if -u is used with ssh
       commands may be left running on remote hosts  even  after  timeout  has
       killed local ssh processes.

       Output  from multiple processes per node may be interspersed when using
       qshell or mqshell rcmd modules.

       The number of nodes that pdsh can simultaneously execute remote jobs on
       is  limited  by  the  maximum  number  of  threads  that can be created
       concurrently, as well as the availability of reserved ports in the  rsh
       and  qshell  rcmd modules. On systems that implement Posix threads, the
       limit is typically defined by the constant PTHREADS_THREADS_MAX.

NAME

SYNOPSIS

DESCRIPTION

RCMD MODULES

OPTIONS

Standard target nodelist options

Standard pdsh options

qsh/mqsh module options

machines module options

genders module options

nodeupdown module options

slurm module options

rms module options

SDR module options

nodeattr module options

dshgroup module options

netgroup module options

ENVIRONMENT VARIABLES

HOSTLIST EXPRESSIONS

ORIGIN

LIMITATIONS

FILES

SEE ALSO