Man Linux: Main Page and Category List

NAME

       srun - Run parallel jobs

SYNOPSIS

       srun            [OPTIONS...]  executable [args...]

DESCRIPTION

       Run  a  parallel  job  on cluster managed by SLURM.  If necessary, srun
       will first create a resource allocation in which to  run  the  parallel
       job.

OPTIONS

       -A, --account=<account>
              Charge  resources  used  by  this job to specified account.  The
              account is an arbitrary string. The account name may be  changed
              after job submission using the scontrol command.

       --acctg-freq=<seconds>
              Define  the  job accounting sampling interval.  This can be used
              to override  the  JobAcctGatherFrequency  parameter  in  SLURM’s
              configuration  file,  slurm.conf.  A value of zero disables real
              the periodic job sampling and  provides  accounting  information
              only  on  job  termination (reducing SLURM interference with the
              job).

       -B --extra-node-info=<sockets[:cores[:threads]]>
              Request a specific allocation of resources with  details  as  to
              the number and type of computational resources within a cluster:
              number of sockets (or physical processors) per node,  cores  per
              socket,  and  threads  per  core.  The total amount of resources
              being requested is the product of all of the terms.  Each  value
              specified  is considered a minimum.  An asterisk (*) can be used
              as a placeholder indicating that all available resources of that
              type  are  to be utilized.  As with nodes, the individual levels
              can also be specified in separate options if desired:
                  --sockets-per-node=<sockets>
                  --cores-per-socket=<cores>
                  --threads-per-core=<threads>
              If  task/affinity  plugin  is  enabled,   then   specifying   an
              allocation  in this manner also sets a default --cpu_bind option
              of threads if the -B option specifies a thread count,  otherwise
              an  option  of  cores if a core count is specified, otherwise an
              option   of   sockets.    If   SelectType   is   configured   to
              select/cons_res,   it   must   have   a  parameter  of  CR_Core,
              CR_Core_Memory, CR_Socket, or CR_Socket_Memory for  this  option
              to be honored.  This option is not supported on BlueGene systems
              (select/bluegene plugin is configured).

       --begin=<time>
              Defer initiation of this  job  until  the  specified  time.   It
              accepts  times  of  the form HH:MM:SS to run a job at a specific
              time of day (seconds are optional).  (If that  time  is  already
              past,  the next day is assumed.)  You may also specify midnight,
              noon, or teatime (4pm) and you can have a  time-of-day  suffixed
              with  AM  or  PM for running in the morning or the evening.  You
              can also say what day the job will be run, by specifying a  date
              of the form MMDDYY or MM/DD/YY YYYY-MM-DD. Combine date and time
              using the following format YYYY-MM-DD[THH:MM[:SS]]. You can also
              give times like now + count time-units, where the time-units can
              be seconds (default), minutes, hours, days, or weeks and you can
              tell  SLURM  to  run the job today with the keyword today and to
              run the job tomorrow with the keyword tomorrow.  The  value  may
              be changed after job submission using the scontrol command.  For
              example:
                 --begin=16:00
                 --begin=now+1hour
                 --begin=now+60           (seconds by default)
                 --begin=2010-01-20T12:34:00

              Notes on date/time specifications:
               -  Although  the  ’seconds’  field   of   the   HH:MM:SS   time
              specification is allowed by the code, note that the poll time of
              the SLURM scheduler is not precise enough to guarantee  dispatch
              of  the  job  on  the exact second.  The job will be eligible to
              start on the next poll following the specified time.  The  exact
              poll  interval  depends on the SLURM scheduler (e.g., 60 seconds
              with the default sched/builtin).
               -  If  no  time  (HH:MM:SS)  is  specified,  the   default   is
              (00:00:00).
               -  If a date is specified without a year (e.g., MM/DD) then the
              current year is assumed, unless the  combination  of  MM/DD  and
              HH:MM:SS  has  already  passed  for that year, in which case the
              next year is used.

       --checkpoint=<time>
              Specifies the interval between creating checkpoints of  the  job
              step.   By  default,  the  job  step  will  have  no checkpoints
              created.    Acceptable   time   formats    include    "minutes",
              "minutes:seconds",     "hours:minutes:seconds",    "days-hours",
              "days-hours:minutes" and "days-hours:minutes:seconds".

       --checkpoint-dir=<directory>
              Specifies the  directory  into  which  the  job  or  job  step’s
              checkpoint  should  be  written (used by the checkpoint/blcr and
              checkpoint/xlch plugins only).  The default value is the current
              working  directory.   Checkpoint  files  will  be  of  the  form
              "<job_id>.ckpt" for jobs and "<job_id>.<step_id>.ckpt"  for  job
              steps.

       --comment=<string>
              An arbitrary comment.

       -C, --constraint=<list>
              Specify  a  list  of  constraints.  The constraints are features
              that have been assigned to the nodes by the slurm administrator.
              The  list of constraints may include multiple features separated
              by ampersand (AND) and/or  vertical  bar  (OR)  operators.   For
              example:             --constraint="opteron&video"             or
              --constraint="fast|faster".  In the first  example,  only  nodes
              having  both  the feature "opteron" AND the feature "video" will
              be used.  There is no mechanism to specify  that  you  want  one
              node  with  feature  "opteron"  and  another  node  with feature
              "video" in case no node has both features.  If only one of a set
              of possible options should be used for all allocated nodes, then
              use the OR  operator  and  enclose  the  options  within  square
              brackets.  For example: "--constraint=[rack1|rack2|rack3|rack4]"
              might be used to specify that all nodes must be allocated  on  a
              single  rack  of the cluster, but any of those four racks can be
              used.  A request can also specify the  number  of  nodes  needed
              with  some  feature by appending an asterisk and count after the
              feature     name.      For     example     "srun      --nodes=16
              --constraint=graphics*4 ..."  indicates that the job requires 16
              nodes at that at least four of those nodes must have the feature
              "graphics."   Constraints  with node counts may only be combined
              with AND operators.  If no nodes have  the  requested  features,
              then the job will be rejected by the slurm job manager.

       --contiguous
              If  set,  then  the  allocated nodes must form a contiguous set.
              Not honored with the topology/tree or topology/3d_torus plugins,
              both  of  which can modify the node ordering.  Not honored for a
              job step’s allocation.

       --core=<type>
              Adjust corefile format for parallel job. If possible, srun  will
              set  up  the environment for the job such that a corefile format
              other than full core dumps  is  enabled.  If  run  with  type  =
              "list",  srun  will  print  a  list of supported corefile format
              types to stdout and exit.

       --cpu_bind=[{quiet,verbose},]type
              Bind tasks to CPUs. Used only when the task/affinity  plugin  is
              enabled.    The   configuration  parameter  TaskPluginParam  may
              override these options.   For  example,  if  TaskPluginParam  is
              configured  to  bind to cores, your job will not be able to bind
              tasks to sockets.  NOTE: To have  SLURM  always  report  on  the
              selected  CPU  binding for all commands executed in a shell, you
              can  enable  verbose  mode   by   setting   the   SLURM_CPU_BIND
              environment variable value to "verbose".

              The  following  informational environment variables are set when
              --cpu_bind is in use:
                      SLURM_CPU_BIND_VERBOSE
                      SLURM_CPU_BIND_TYPE
                      SLURM_CPU_BIND_LIST

              See  the  ENVIRONMENT  VARIABLE  section  for  a  more  detailed
              description of the individual SLURM_CPU_BIND* variables.

              When  using --cpus-per-task to run multithreaded tasks, be aware
              that CPU binding is inherited from the parent  of  the  process.
              This  means that the multithreaded task should either specify or
              clear the CPU binding itself to avoid having all threads of  the
              multithreaded   task  use  the  same  mask/CPU  as  the  parent.
              Alternatively, fat masks (masks  which  specify  more  than  one
              allowed  CPU)  could  be  used for the tasks in order to provide
              multiple CPUs for the multithreaded tasks.

              By default, a job step has access to every CPU allocated to  the
              job.   To  ensure  that  distinct CPUs are allocated to each job
              step, use the --exclusive option.

              If the job step allocation includes an allocation with a  number
              of sockets, cores, or threads equal to the number of tasks to be
              started  then  the  tasks  will  by  default  be  bound  to  the
              appropriate  resources.   Disable  this  mode  of  operation  by
              explicitly setting "--cpu-bind=none".

              Note that a job step can be allocated different numbers of  CPUs
              on each node or be allocated CPUs not starting at location zero.
              Therefore one of the options which  automatically  generate  the
              task  binding  is  recommended.   Explicitly  specified masks or
              bindings are only honored when the job step has  been  allocated
              every available CPU on the node.

              Binding  a task to a NUMA locality domain means to bind the task
              to the set of CPUs that belong to the NUMA  locality  domain  or
              "NUMA  node".   If  NUMA  locality  domain  options  are used on
              systems with no NUMA support, then each socket is  considered  a
              locality domain.

              Supported options include:

              q[uiet]
                     Quietly bind before task runs (default)

              v[erbose]
                     Verbosely report binding before task runs

              no[ne] Do not bind tasks to CPUs (default)

              rank   Automatically  bind  by task rank.  Task zero is bound to
                     socket (or core or  thread)  zero,  etc.   Not  supported
                     unless the entire node is allocated to the job.

              map_cpu:<list>
                     Bind  by  mapping  CPU  IDs  to  tasks as specified where
                     <list> is  <cpuid1>,<cpuid2>,...<cpuidN>.   CPU  IDs  are
                     interpreted  as  decimal  values unless they are preceded
                     with  ’0x’  in  which  case  they  are   interpreted   as
                     hexadecimal values.  Not supported unless the entire node
                     is allocated to the job.

              mask_cpu:<list>
                     Bind by setting CPU masks on  tasks  as  specified  where
                     <list>  is  <mask1>,<mask2>,...<maskN>.   CPU  masks  are
                     always interpreted  as  hexadecimal  values  but  can  be
                     preceded with an optional ’0x’.  Not supported unless the
                     entire node is allocated to the job.

              rank_ldom
                     Bind to a NUMA locality domain by rank

              map_ldom:<list>
                     Bind by mapping NUMA locality  domain  IDs  to  tasks  as
                     specified  where  <list>  is  <ldom1>,<ldom2>,...<ldomN>.
                     The locality domain IDs are interpreted as decimal values
                     unless  they  are  preceded  with ’0x’ in which case they
                     areinterpreted  as  hexadecimal  values.   Not  supported
                     unless the entire node is allocated to the job.

              mask_ldom:<list>
                     Bind  by  setting  NUMA locality domain masks on tasks as
                     specified  where  <list>  is  <mask1>,<mask2>,...<maskN>.
                     NUMA  locality  domain  masks  are  always interpreted as
                     hexadecimal values but can be preceded with  an  optional
                     ’0x’.   Not supported unless the entire node is allocated
                     to the job.

              sockets
                     Automatically generate masks binding  tasks  to  sockets.
                     If  the  number  of  tasks  differs  from  the  number of
                     allocated sockets this can result in sub-optimal binding.

              cores  Automatically  generate masks binding tasks to cores.  If
                     the number of tasks differs from the number of  allocated
                     cores this can result in sub-optimal binding.

              threads
                     Automatically  generate  masks  binding tasks to threads.
                     If the  number  of  tasks  differs  from  the  number  of
                     allocated threads this can result in sub-optimal binding.

              ldoms  Automatically  generate  masks  binding  tasks  to   NUMA
                     locality  domains.   If  the number of tasks differs from
                     the number of allocated locality domains this can  result
                     in sub-optimal binding.

              help   Show help message for cpu_bind

       -c, --cpus-per-task=<ncpus>
              Request  that ncpus be allocated per process. This may be useful
              if the job is multithreaded and requires more than one  CPU  per
              task  for  optimal  performance.  The  default  is  one  CPU per
              process.  If -c is specified without -n, as many tasks  will  be
              allocated   per   node  as  possible  while  satisfying  the  -c
              restriction. For instance on a cluster with 8 CPUs per  node,  a
              job  request  for 4 nodes and 3 CPUs per task may be allocated 3
              or 6 CPUs per node (1  or  2  tasks  per  node)  depending  upon
              resource  consumption by other jobs. Such a job may be unable to
              execute more than a total of 4 tasks.  This option may  also  be
              useful  to  spawn  tasks without allocating resources to the job
              step from the job’s allocation when running multiple  job  steps
              with the --exclusive option.

       -d, --dependency=<dependency_list>
              Defer  the  start  of  this job until the specified dependencies
              have been satisfied completed.  <dependency_list> is of the form
              <type:job_id[:job_id][,type:job_id[:job_id]]>.   Many  jobs  can
              share the same dependency and these  jobs  may  even  belong  to
              different  users. The  value may be changed after job submission
              using the scontrol command.

              after:job_id[:jobid...]
                     This job can begin execution  after  the  specified  jobs
                     have begun execution.

              afterany:job_id[:jobid...]
                     This  job  can  begin  execution after the specified jobs
                     have terminated.

              afternotok:job_id[:jobid...]
                     This job can begin execution  after  the  specified  jobs
                     have terminated in some failed state (non-zero exit code,
                     node failure, timed out, etc).

              afterok:job_id[:jobid...]
                     This job can begin execution  after  the  specified  jobs
                     have  successfully  executed (ran to completion with non-
                     zero exit code).

              singleton
                     This  job  can  begin  execution  after  any   previously
                     launched  jobs  sharing  the  same job name and user have
                     terminated.

       -D, --chdir=<path>
              have the remote processes do a chdir to  path  before  beginning
              execution.  The  default  is  to  chdir  to  the current working
              directory of the srun process.

       -e, --error=<mode>
              Specify  how  stderr  is  to  be  redirected.  By   default   in
              interactive  mode,  srun  redirects  stderr  to the same file as
              stdout, if one is specified. The --error option is  provided  to
              allow stdout and stderr to be redirected to different locations.
              See IO Redirection below for more  options.   If  the  specified
              file already exists, it will be overwritten.

       -E, --preserve-env
              Pass  the  current  values of environment variables SLURM_NNODES
              and  SLURM_NPROCS  through  to  the  executable,   rather   than
              computing them from commandline parameters.

       --epilog=<executable>
              srun will run executable just after the job step completes.  The
              command line arguments for executable will be  the  command  and
              arguments  of  the  job  step.  If executable is "none", then no
              epilog will be run.  This  parameter  overrides  the  SrunEpilog
              parameter in slurm.conf.

       --exclusive
              When  used  to  initiate  a job, the job allocation cannot share
              nodes with other running jobs.  This is the oposite of  --share,
              whichever  option  is  seen  last  on the command line will win.
              (The  default  shared/exclusive  behaviour  depends  on   system
              configuration.)

              This  option can also be used when initiating more than job step
              within an existing resource allocation  and  you  want  separate
              processors  to  be  dedicated  to  each  job step. If sufficient
              processors are not available to initiate the job step,  it  will
              be  deferred.  This  can  be  thought  of  as providing resource
              management for the job within it’s  allocation.  Note  that  all
              CPUs  allocated  to  a job are available to each job step unless
              the --exclusive option is used plus task affinity is configured.
              Since resource management is provided by processor, the --ntasks
              option must be specified, but the following options  should  NOT
              be specified --nodes, --relative, --distribution=arbitrary.  See
              EXAMPLE below.

       --gid=<group>
              If srun is run as root, and the --gid option is used, submit the
              job  with  group’s  group  access permissions.  group may be the
              group name or the numerical group ID.

       -h, --help
              Display help information and exit.

       --hint=<type>
              Bind tasks according to application hints

              compute_bound
                     Select settings for compute bound applications:  use  all
                     cores in each socket, one thread per core

              memory_bound
                     Select  settings  for memory bound applications: use only
                     one core in each socket, one thread per core

              [no]multithread
                     [don’t] use extra threads  with  in-core  multi-threading
                     which can benefit communication intensive applications

              help   show this help message

       -I, --immediate[=<seconds>]
              exit  if  resources  are  not  available  within the time period
              specified.  If no argument is given, resources must be available
              immediately for the request to succeed.  By default, --immediate
              is off, and  the  command  will  block  until  resources  become
              available.

       -i, --input=<mode>
              Specify  how  stdin is to redirected. By default, srun redirects
              stdin from the terminal all tasks. See IO Redirection below  for
              more  options.   For  OS X, the poll() function does not support
              stdin, so input from a terminal is not possible.

       -J, --job-name=<jobname>
              Specify a name for the job. The specified name will appear along
              with the job id number when querying running jobs on the system.
              The default is the supplied  executable  program’s  name.  NOTE:
              This  information  may be written to the slurm_jobacct.log file.
              This file is space delimited so  if  a  space  is  used  in  the
              jobname  name  it will cause problems in properly displaying the
              contents of the slurm_jobacct.log file when the sacct command is
              used.

       --jobid=<jobid>
              Initiate  a  job step under an already allocated job with job id
              id.  Using this option will cause srun to behave exactly  as  if
              the SLURM_JOB_ID environment variable was set.

       -K, --kill-on-bad-exit
              Immediately  terminate  a  job if any task exits with a non-zero
              exit  code.   Note:  The  -K,  --kill-on-bad-exit  option  takes
              precedence over -W, --wait to terminate the job immediately if a
              task exits with a non-zero exit code.

       -k, --no-kill
              Do not automatically terminate a job of one of the nodes it  has
              been  allocated  fails.  This option is only recognized on a job
              allocation, not for the submission of individual job steps.  The
              job will assume all responsibilities for fault-tolerance.  Tasks
              launch using this option will not be considered terminated (e.g.
              -K,  --kill-on-bad-exit  and  -W,  --wait  options  will have no
              effect upon the job step).  The active job step (MPI  job)  will
              likely suffer a fatal error, but subsequent job steps may be run
              if this option is specified.  The default action is to terminate
              the job upon node failure.

       -l, --label
              prepend task number to lines of stdout/err. Normally, stdout and
              stderr from remote tasks is line-buffered directly to the stdout
              and  stderr  of  srun.  The --label option will prepend lines of
              output with the remote task id.

       -L, --licenses=<license>
              Specification of licenses (or other resources available  on  all
              nodes  of  the  cluster)  which  must  be allocated to this job.
              License names can be followed by  an  asterisk  and  count  (the
              default  count  is one).  Multiple license names should be comma
              separated (e.g.  "--licenses=foo*4,bar").

       -m, --distribution=
              <block|cyclic|arbitrary|plane=<options>>  Specify  an  alternate
              distribution method for remote processes.

              block  The  block distribution method will distribute tasks to a
                     node such  that  consecutive  tasks  share  a  node.  For
                     example,  consider an allocation of three nodes each with
                     two cpus. A four-task  block  distribution  request  will
                     distribute  those  tasks  to the nodes with tasks one and
                     two on the first node, task three on the second node, and
                     task  four  on the third node.  Block distribution is the
                     default behavior if  the  number  of  tasks  exceeds  the
                     number of allocated nodes.

              cyclic The cyclic distribution method will distribute tasks to a
                     node such that consecutive  tasks  are  distributed  over
                     consecutive   nodes   (in  a  round-robin  fashion).  For
                     example, consider an allocation of three nodes each  with
                     two  cpus.  A  four-task cyclic distribution request will
                     distribute those tasks to the nodes with  tasks  one  and
                     four  on the first node, task two on the second node, and
                     task three on the third node. Cyclic distribution is  the
                     default behavior if the number of tasks is no larger than
                     the number of allocated nodes.

              plane  The tasks are distributed in blocks of a specified  size.
                     The options include a number representing the size of the
                     task  block.    This   is   followed   by   an   optional
                     specification  of  the  task distribution scheme within a
                     block of tasks and between the blocks of tasks.  For more
                     details (including examples and diagrams), please see
                     https://computing.llnl.gov/linux/slurm/mc_support.html
                     and
                     https://computing.llnl.gov/linux/slurm/dist_plane.html.

              arbitrary
                     The   arbitrary  method  of  distribution  will  allocate
                     processes in-order as listed in file  designated  by  the
                     environment variable SLURM_HOSTFILE.  If this variable is
                     listed it will over ride any other method specified.   If
                     not  set  the  method  will default to block.  Inside the
                     hostfile must contain at  minimum  the  number  of  hosts
                     requested  and  be  one  per line or comma separated.  If
                     specifying a task  count  (-n,  --ntasks=<number>),  your
                     tasks  will  be laid out on the nodes in the order of the
                     file.

       --mail-type=<type>
              Notify user by email when certain event types occur.  Valid type
              values  are  BEGIN, END, FAIL, ALL (any state change).  The user
              to be notified is indicated with --mail-user.

       --mail-user=<user>
              User to receive email notification of state changes  as  defined
              by --mail-type.  The default value is the submitting user.

       --mem=<MB>
              Specify the real memory required per node in MegaBytes.  Default
              value is DefMemPerNode and the maximum value  is  MaxMemPerNode.
              If configured, both of parameters can be seen using the scontrol
              show config command.  This parameter would generally be used  if
              whole  nodes  are  allocated to jobs (SelectType=select/linear).
              Also see --mem-per-cpu.  --mem and  --mem-per-cpu  are  mutually
              exclusive.

       --mem-per-cpu=<MB>
              Mimimum memory required per allocated CPU in MegaBytes.  Default
              value is DefMemPerCPU and the maximum value is MaxMemPerCPU.  If
              configured,  both  of  parameters can be seen using the scontrol
              show config command.  This parameter would generally be used  if
              individual      processors     are     allocated     to     jobs
              (SelectType=select/cons_res).   Also  see  --mem.    --mem   and
              --mem-per-cpu are mutually exclusive.

       --mem_bind=[{quiet,verbose},]type
              Bind tasks to memory. Used only when the task/affinity plugin is
              enabled and the NUMA memory functions are available.  Note  that
              the  resolution  of  CPU  and  memory binding may differ on some
              architectures. For example, CPU binding may be performed at  the
              level  of the cores within a processor while memory binding will
              be performed at the level of  nodes,  where  the  definition  of
              "nodes"  may  differ  from system to system. The use of any type
              other than "none" or "local" is not recommended.   If  you  want
              greater control, try running a simple test code with the options
              "--cpu_bind=verbose,none --mem_bind=verbose,none"  to  determine
              the specific configuration.

              NOTE: To have SLURM always report on the selected memory binding
              for all commands executed in a shell,  you  can  enable  verbose
              mode by setting the SLURM_MEM_BIND environment variable value to
              "verbose".

              The following informational environment variables are  set  when
              --mem_bindis in use:

                      SLURM_MEM_BIND_VERBOSE
                      SLURM_MEM_BIND_TYPE
                      SLURM_MEM_BIND_LIST

              See  the  ENVIRONMENT  VARIABLES  section  for  a  more detailed
              description of the individual SLURM_MEM_BIND* variables.

              Supported options include:

              q[uiet]
                     quietly bind before task runs (default)

              v[erbose]
                     verbosely report binding before task runs

              no[ne] don’t bind tasks to memory (default)

              rank   bind by task rank (not recommended)

              local  Use memory local to the processor in use

              map_mem:<list>
                     bind by mapping a node’s memory  to  tasks  as  specified
                     where  <list>  is <cpuid1>,<cpuid2>,...<cpuidN>.  CPU IDs
                     are  interpreted  as  decimal  values  unless  they   are
                     preceded  with  ’0x’  in  which  case they interpreted as
                     hexadecimal values (not recommended)

              mask_mem:<list>
                     bind by setting memory masks on tasks as specified  where
                     <list>  is  <mask1>,<mask2>,...<maskN>.  memory masks are
                     always interpreted  as  hexadecimal  values.   Note  that
                     masks  must  be  preceded with a ’0x’ if they don’t begin
                     with [0-9] so they are seen as numerical values by  srun.

              help   show this help message

       --mincores=<n>
              Specify a minimum number of cores per socket.

       --mincpus=<n>
              Specify a minimum number of logical cpus/processors per node.

       --minsockets=<n>
              Specify  a  minimum  number of sockets (physical processors) per
              node.

       --minthreads=<n>
              Specify a minimum number of threads per core.

       --msg-timeout=<seconds>
              Modify the job launch message timeout.   The  default  value  is
              MessageTimeout  in  the  SLURM  configuration  file  slurm.conf.
              Changes to this are typically  not  recommended,  but  could  be
              useful to diagnose problems.

       --mpi=<mpi_type>
              Identify  the  type  of  MPI  to  be  used. May result in unique
              initiation procedures.

              list   Lists available mpi types to choose from.

              lam    Initiates one ’lamd’ process  per  node  and  establishes
                     necessary environment variables for LAM/MPI.

              mpich1_shmem
                     Initiates  one process per node and establishes necessary
                     environment variables for  mpich1  shared  memory  model.
                     This also works for mvapich built for shared memory.

              mpichgm
                     For use with Myrinet.

              mvapich
                     For use with Infiniband.

              openmpi
                     For use with OpenMPI.

              none   No  special MPI processing. This is the default and works
                     with many other versions of MPI.

       --multi-prog
              Run a job with different programs and  different  arguments  for
              each  task.  In  this  case, the executable program specified is
              actually a configuration  file  specifying  the  executable  and
              arguments  for  each  task.  See  MULTIPLE PROGRAM CONFIGURATION
              below for details on the configuration file contents.

       -N, --nodes=<minnodes[-maxnodes]>
              Request that a minimum of minnodes nodes be  allocated  to  this
              job.   The  scheduler  may decide to launch the job on more than
              minnodes nodes.  A limit  on  the  maximum  node  count  may  be
              specified  with  maxnodes (e.g. "--nodes=2-4").  The minimum and
              maximum node count may be the same to specify a specific  number
              of  nodes  (e.g.  "--nodes=2-2"  will  ask  for two and ONLY two
              nodes).  The partition’s node limits supersede those of the job.
              If  a  job’s  node limits are outside of the range permitted for
              its associated partition, the job will  be  left  in  a  PENDING
              state.   This  permits  possible execution at a later time, when
              the partition limit is changed.  If a job node limit exceeds the
              number  of  nodes  configured  in the partition, the job will be
              rejected.  Note that the environment variable SLURM_NNODES  will
              be  set to the count of nodes actually allocated to the job. See
              the ENVIRONMENT VARIABLES  section for more information.  If  -N
              is  not  specified,  the  default behavior is to allocate enough
              nodes to satisfy the requirements of the -n and -c options.  The
              job will be allocated as many nodes as possible within the range
              specified and without delaying the initiation of the job.

       -n, --ntasks=<number>
              Specify the number of tasks to run. Request that  srun  allocate
              resources  for  ntasks tasks.  The default is one task per node,
              but note  that  the  --cpus-per-task  option  will  change  this
              default.

       --network=<type>
              Specify  the  communication protocol to be used.  This option is
              supported on AIX systems.  Since POE is used  to  launch  tasks,
              this  option  is  not  normally  used  or is specified using the
              SLURM_NETWORK environment variable.  The interpretation of  type
              is system dependent.  For systems with an IBM Federation switch,
              the following comma-separated and  case  insensitive  types  are
              recognized:  IP  (the default is user-space), SN_ALL, SN_SINGLE,
              BULK_XFER and adapter names  (e.g. SNI0  and  SNI1).   For  more
              information,  on  IBM  systems  see  poe  documentation  on  the
              environment variables MP_EUIDEVICE and  MP_USE_BULK_XFER.   Note
              that  only  four jobs steps may be active at once on a node with
              the BULK_XFER option due to limitations in the Federation switch
              driver.

       --nice[=adjustment]
              Run  the  job with an adjusted scheduling priority within SLURM.
              With no adjustment value the scheduling priority is decreased by
              100.  The  adjustment range is from -10000 (highest priority) to
              10000 (lowest priority). Only privileged  users  can  specify  a
              negative  adjustment.  NOTE: This option is presently ignored if
              SchedulerType=sched/wiki or SchedulerType=sched/wiki2.

       --ntasks-per-core=<ntasks>
              Request that ntasks be invoked on each core.  Meant to  be  used
              with  the  --ntasks option.  Related to --ntasks-per-node except
              at the core  level  instead  of  the  node  level.   Masks  will
              automatically  be  generated  to bind the tasks to specific core
              unless --cpu_bind=none is specified.  NOTE: This option  is  not
              supported       unless      SelectTypeParameters=CR_Core      or
              SelectTypeParameters=CR_Core_Memory is configured.

       --ntasks-per-socket=<ntasks>
              Request that ntasks be invoked on each socket.  Meant to be used
              with  the  --ntasks option.  Related to --ntasks-per-node except
              at the socket level instead  of  the  node  level.   Masks  will
              automatically be generated to bind the tasks to specific sockets
              unless --cpu_bind=none is specified.  NOTE: This option  is  not
              supported      unless      SelectTypeParameters=CR_Socket     or
              SelectTypeParameters=CR_Socket_Memory is configured.

       --ntasks-per-node=<ntasks>
              Request that ntasks be invoked on each node.  Meant to  be  used
              with    the    --nodes    option.     This    is    related   to
              --cpus-per-task=ncpus, but does not  require  knowledge  of  the
              actual  number  of cpus on each node.  In some cases, it is more
              convenient to be able to request that no more  than  a  specific
              number  of  tasks  be  invoked  on  each node.  Examples of this
              include submitting a hybrid MPI/OpenMP app where  only  one  MPI
              "task/rank"  should  be assigned to each node while allowing the
              OpenMP portion to utilize all of the parallelism present in  the
              node,  or  submitting  a  single setup/cleanup/monitoring job to
              each node of a pre-existing allocation as one step in  a  larger
              job script.

       -O, --overcommit
              Overcommit resources. Normally, srun will not allocate more than
              one  process  per  CPU.  By  specifying  --overcommit  you   are
              explicitly  allowing  more  than one process per CPU. However no
              more than MAX_TASKS_PER_NODE tasks are permitted to execute  per
              node.   NOTE:  MAX_TASKS_PER_NODE is defined in the file slurm.h
              and is not a variable, it is set at SLURM build time.

       -o, --output=<mode>
              Specify  the  mode  for  stdout  redirection.  By   default   in
              interactive  mode,  srun collects stdout from all tasks and line
              buffers this output to  the  attached  terminal.  With  --output
              stdout  may be redirected to a file, to one file per task, or to
              /dev/null. See section IO  Redirection  below  for  the  various
              forms of mode.  If the specified file already exists, it will be
              overwritten.

              If --error is not also  specified  on  the  command  line,  both
              stdout  and  stderr  will  directed  to  the  file  specified by
              --output.

       --open-mode=<append|truncate>
              Open the output and error files using append or truncate mode as
              specified.   The  default  value  is  specified  by  the  system
              configuration parameter JobFileAppend.

       -p, --partition=<partition name>
              Request a specific partition for the  resource  allocation.   If
              not  specified,  the  default  behaviour  is  to allow the slurm
              controller to select the default partition as designated by  the
              system administrator.

       --prolog=<executable>
              srun  will  run  executable  just before launching the job step.
              The command line arguments for executable will  be  the  command
              and arguments of the job step.  If executable is "none", then no
              prolog will be run.  This  parameter  overrides  the  SrunProlog
              parameter in slurm.conf.

       --propagate[=rlimits]
              Allows  users to specify which of the modifiable (soft) resource
              limits to propagate to the compute  nodes  and  apply  to  their
              jobs.   If  rlimits  is  not specified, then all resource limits
              will be propagated.  The following rlimit names are supported by
              Slurm  (although  some  options  may  not  be  supported on some
              systems):

              ALL       All limits listed below

              AS        The maximum address space for a processes

              CORE      The maximum size of core file

              CPU       The maximum amount of CPU time

              DATA      The maximum size of a process’s data segment

              FSIZE     The maximum size of files created

              MEMLOCK   The maximum size that may be locked into memory

              NOFILE    The maximum number of open files

              NPROC     The maximum number of processes available

              RSS       The maximum resident set size

              STACK     The maximum stack size

       --pty  Execute  task  zero  in  pseudo   terminal.    Implicitly   sets
              --unbuffered.  Implicitly sets --error and --output to /dev/null
              for all tasks except task zero, which may cause those  tasks  to
              exit immediately (e.g. shells will typically exit immediately in
              that situation).  Not currently supported on AIX platforms.

       -Q, --quiet
              Suppress informational messages from srun. Errors will still  be
              displayed.

       -q, --quit-on-interrupt
              Quit  immediately  on single SIGINT (Ctrl-C). Use of this option
              disables  the  status  feature  normally  available  when   srun
              receives  a single Ctrl-C and causes srun to instead immediately
              terminate the running job.

       --qos=<qos>
              Request a quality of service for the job.   QOS  values  can  be
              defined  for  each user/cluster/account association in the SLURM
              database.  Users will be limited to their association’s  defined
              set   of   qos’s   when   the   SLURM  configuration  parameter,
              AccountingStorageEnforce, includes "qos" in it’s definition.

       -r, --relative=<n>
              Run a job step relative to node n  of  the  current  allocation.
              This  option  may  be used to spread several job steps out among
              the nodes of the current job. If -r is  used,  the  current  job
              step  will  begin at node n of the allocated nodelist, where the
              first node is considered node 0.  The -r option is not permitted
              along with -w or -x, and will be ignored when not running within
              a prior allocation (i.e. when  SLURM_JOB_ID  is  not  set).  The
              default  for  n is 0. If the value of --nodes exceeds the number
              of nodes  identified  with  the  --relative  option,  a  warning
              message  will  be  printed  and  the --relative option will take
              precedence.

       --resv-ports
              Reserve communication ports for this job.  Used for OpenMPI.

       --reservation=<name>
              Allocate resources for the job from the named reservation.

       --restart-dir=<directory>
              Specifies the  directory  from  which  the  job  or  job  step’s
              checkpoint  should  be  read  (used  by the checkpoint/blcrm and
              checkpoint/xlch plugins only).

       -s, --share
              The job can share nodes with other running jobs. This may result
              in  faster  job  initiation  and  higher system utilization, but
              lower application performance.

       --signal=<sig_num>[@<sig_time>]
              When a job is within sig_time seconds of its end time,  send  it
              the  signal sig_num.  Due to the resolution of event handling by
              SLURM, the signal may be sent up  to  60  seconds  earlier  than
              specified.   sig_num may either be a signal number or name (e.g.
              "10" or "USR1").  sig_time must have integer value between  zero
              and  65535.   By default, no signal is sent before the job’s end
              time.  If a sig_num  is  specified  without  any  sig_time,  the
              default time will be 60 seconds.

       --slurmd-debug=<level>
              Specify  a  debug  level  for slurmd(8). level may be an integer
              value between  0  [quiet,  only  errors  are  displayed]  and  4
              [verbose  operation].   The  slurmd  debug information is copied
              onto  the  stderr  of  the  job.  By  default  only  errors  are
              displayed.

       -T, --threads=<nthreads>
              Allows  limiting  the  number of concurrent threads used to send
              the job request from the srun process to the slurmd processes on
              the  allocated nodes. Default is to use one thread per allocated
              node up to a maximum of 60 concurrent threads.  Specifying  this
              option limits the number of concurrent threads to nthreads (less
              than or equal to 60).  This should only be used  to  set  a  low
              thread count for testing on very small memory computers.

       -t, --time=<time>
              Set  a  limit  on the total run time of the job or job step.  If
              the requested time limit for a job exceeds the partition’s  time
              limit,  the  job  will  be  left  in  a  PENDING state (possibly
              indefinitely).  If the requested  time  limit  for  a  job  step
              exceeds  the  partition’s  time  limit, the job step will not be
              initiated.  The default  time  limit  is  the  partition’s  time
              limit.  When the time limit is reached, the job’s tasks are sent
              SIGTERM followed by SIGKILL. If the time limit is for  the  job,
              all  job  steps  are signaled. If the time limit is for a single
              job step within an existing job allocation, only that  job  step
              will  be affected. A job time limit supercedes all job step time
              limits. The interval between SIGTERM and SIGKILL is specified by
              the  SLURM  configuration  parameter  KillWait.  A time limit of
              zero requests that no time limit be  imposed.   Acceptable  time
              formats        include       "minutes",       "minutes:seconds",
              "hours:minutes:seconds", "days-hours", "days-hours:minutes"  and
              "days-hours:minutes:seconds".

       --task-epilog=<executable>
              The  slurmstepd  daemon will run executable just after each task
              terminates.  This  will  be  executed  before   any   TaskEpilog
              parameter  in slurm.conf is executed. This is meant to be a very
              short-lived program. If it  fails  to  terminate  within  a  few
              seconds,  it will be killed along with any descendant processes.

       --task-prolog=<executable>
              The slurmstepd daemon will run executable just before  launching
              each  task. This will be executed after any TaskProlog parameter
              in slurm.conf  is  executed.   Besides  the  normal  environment
              variables,  this  has  SLURM_TASK_PID  available to identify the
              process ID of the task being started.  Standard output from this
              program  of  the  form  "export  NAME=value" will be used to set
              environment variables for the task being spawned.

       --tmp=<MB>
              Specify a minimum amount of temporary disk space.

       -u, --unbuffered
              Do not line buffer stdout from remote tasks. This option  cannot
              be used with --label.

       --usage
              Display brief help message and exit.

       --uid=<user>
              Attempt  to  submit  and/or  run  a  job  as user instead of the
              invoking user id. The invoking user’s credentials will  be  used
              to  check access permissions for the target partition. User root
              may use this option to run jobs as a normal user in  a  RootOnly
              partition  for  example.  If  run  as  root,  srun will drop its
              permissions to  the  uid  specified  after  node  allocation  is
              successful. user may be the user name or numerical user ID.

       -V, --version
              Display version information and exit.

       -v, --verbose
              Increase   the   verbosity  of  srun’s  informational  messages.
              Multiple  -v’s  will  further  increase  srun’s  verbosity.   By
              default only errors will be displayed.

       -W, --wait=<seconds>
              Specify  how long to wait after the first task terminates before
              terminating all remaining tasks.  A  value  of  0  indicates  an
              unlimited  wait (a warning will be issued after 60 seconds). The
              default value is set by the  WaitTime  parameter  in  the  slurm
              configuration  file  (see  slurm.conf(5)).  This  option  can be
              useful to insure that a job is terminated in a timely fashion in
              the  event  that one or more tasks terminate prematurely.  Note:
              The -K, --kill-on-bad-exit  option  takes  precedence  over  -W,
              --wait  to  terminate the job immediately if a task exits with a
              non-zero exit code.

       -w, --nodelist=<host1,host2,... or filename>
              Request a specific list of hosts. The job will contain at  least
              these hosts. The list may be specified as a comma-separated list
              of hosts, a range of hosts (host[1-5,7,...] for example),  or  a
              filename.   The host list will be assumed to be a filename if it
              contains a "/" character.  If  you  specify  a  max  node  count
              (-N1-2)  if  there  are  more  than 2 hosts in the file only the
              first 2 nodes will be used in the request list.

       --wckey=<wckey>
              Specify wckey to be used with job.  If  TrackWCKey=no  (default)
              in the slurm.conf this value is ignored.

       -X, --disable-status
              Disable  the  display of task status when srun receives a single
              SIGINT (Ctrl-C). Instead immediately forward the SIGINT  to  the
              running  job.  Without this option a second Ctrl-C in one second
              is  required  to  forcibly  terminate  the  job  and  srun  will
              immediately  exit.  May also be set via the environment variable
              SLURM_DISABLE_STATUS.

       -x, --exclude=<host1,host2,... or filename>
              Request that a specific list of hosts not  be  included  in  the
              resources  allocated  to this job. The host list will be assumed
              to be a filename if it contains a "/"character.

       -Z, --no-allocate
              Run the specified tasks on a set of  nodes  without  creating  a
              SLURM  "job"  in the SLURM queue structure, bypassing the normal
              resource allocation step.  The list of nodes must  be  specified
              with  the  -w,  --nodelist  option.  This is a privileged option
              only available for the users "SlurmUser" and "root".

       The following options support Blue Gene systems, but may be  applicable
       to other systems as well.

       --blrts-image=<path>
              Path to blrts image for bluegene block.  BGL only.  Default from
              blugene.conf if not set.

       --cnload-image=<path>
              Path to compute  node  image  for  bluegene  block.   BGP  only.
              Default from blugene.conf if not set.

       --conn-type=<type>
              Require  the  partition connection type to be of a certain type.
              On Blue Gene the acceptable of type are MESH, TORUS and NAV.  If
              NAV,  or  if  not  set,  then SLURM will try to fit a TORUS else
              MESH.  You should not normally  set  this  option.   SLURM  will
              normally  allocate a TORUS if possible for a given geometry.  If
              running on a BGP system and wanting to run in HTC mode (only for
              1  midplane  and  below).   You can use HTC_S for SMP, HTC_D for
              Dual, HTC_V for virtual node mode, and HTC_L for Linux mode.

       -g, --geometry=<XxYxZ>
              Specify the geometry requirements for the job. The three numbers
              represent  the  required  geometry giving dimensions in the X, Y
              and Z directions. For example  "--geometry=2x3x4",  specifies  a
              block  of  nodes  having  2  x  3  x 4 = 24 nodes (actually base
              partitions on Blue Gene).

       --ioload-image=<path>
              Path to io image for bluegene block.  BGP  only.   Default  from
              blugene.conf if not set.

       --linux-image=<path>
              Path to linux image for bluegene block.  BGL only.  Default from
              blugene.conf if not set.

       --mloader-image=<path>
              Path  to  mloader  image  for  bluegene  block.   Default   from
              blugene.conf if not set.

       -R, --no-rotate
              Disables  rotation  of  the job’s requested geometry in order to
              fit an appropriate partition.  By default the specified geometry
              can rotate in three dimensions.

       --ramdisk-image=<path>
              Path  to  ramdisk  image for bluegene block.  BGL only.  Default
              from blugene.conf if not set.

       --reboot
              Force the allocated nodes to reboot before starting the job.

       srun will submit the job request to  the  slurm  job  controller,  then
       initiate  all  processes  on the remote nodes. If the request cannot be
       met immediately, srun will block until the resources are  free  to  run
       the  job.  If  the  -I  (--immediate)  option  is  specified  srun will
       terminate if resources are not immediately available.

       When initiating  remote  processes  srun  will  propagate  the  current
       working  directory,  unless  --chdir=<path> is specified, in which case
       path will become the working directory for the remote processes.

       The -n, -c, and  -N  options  control  how  CPUs   and  nodes  will  be
       allocated  to  the job. When specifying only the number of processes to
       run with -n, a  default  of  one  CPU  per  process  is  allocated.  By
       specifying the number of CPUs required per task (-c), more than one CPU
       may be allocated per process. If the number of nodes is specified  with
       -N,  srun  will  attempt  to  allocate  at  least  the  number of nodes
       specified.

       Combinations of the above three options  may  be  used  to  change  how
       processes  are  distributed  across  nodes  and  cpus. For instance, by
       specifying both the number of processes and number of nodes on which to
       run,  the  number  of  processes  per  node is implied. However, if the
       number of CPUs per process is more important then number  of  processes
       (-n) and the number of CPUs per process (-c) should be specified.

       srun  will  refuse  to   allocate  more than one process per CPU unless
       --overcommit (-O) is also specified.

       srun will attempt to meet the above specifications "at a minimum." That
       is,  if  16 nodes are requested for 32 processes, and some nodes do not
       have 2 CPUs, the allocation of nodes will be increased in order to meet
       the  demand  for  CPUs. In other words, a minimum of 16 nodes are being
       requested. However, if 16 nodes are requested for  15  processes,  srun
       will  consider  this  an  error,  as  15 processes cannot run across 16
       nodes.

       IO Redirection

       By default, stdout and stderr will be redirected from all tasks to  the
       stdout  and  stderr  of  srun,  and  stdin  will be redirected from the
       standard input of srun to all remote tasks.  If stdin  is  only  to  be
       read  by  a subset of the spawned tasks, specifying a file to read from
       rather than forwarding stdin from the srun command may be preferable as
       it  avoids  moving and storing data that will never be read.  For OS X,
       the poll() function does not support stdin, so input from a terminal is
       not possible.  This behavior may be changed with the --output, --error,
       and --input (-o, -e, -i) options. Valid format specifications for these
       options are

       all       stdout stderr is redirected from all tasks to srun.  stdin is
                 broadcast  to  all  remote  tasks.   (This  is  the   default
                 behavior)

       none      stdout  and  stderr  is not received from any task.  stdin is
                 not sent to any task (stdin is closed).

       taskid    stdout and/or stderr are redirected from only the  task  with
                 relative  id  equal  to  taskid, where 0 <= taskid <= ntasks,
                 where ntasks is the total number of tasks in the current  job
                 step.   stdin  is  redirected  from the stdin of srun to this
                 same task.  This file will be written on the  node  executing
                 the task.

       filename  srun  will  redirect  stdout  and/or stderr to the named file
                 from all tasks.  stdin will be redirected from the named file
                 and  broadcast to all tasks in the job.  filename refers to a
                 path on the host that runs srun.  Depending on the  cluster’s
                 file  system  layout, this may result in the output appearing
                 in different places depending on whether the job  is  run  in
                 batch mode.

       format string
                 srun  allows  for  a format string to be used to generate the
                 named IO file described above. The following list  of  format
                 specifiers  may  be  used  in the format string to generate a
                 filename that will be unique to a given jobid, stepid,  node,
                 or  task.  In  each case, the appropriate number of files are
                 opened and associated with the corresponding tasks. Note that
                 any  format  string  containing  %t,  %n,  and/or  %N will be
                 written on the node executing the task rather than  the  node
                 where srun executes.

                 %J     jobid.stepid of the running job. (e.g. "128.0")

                 %j     jobid of the running job.

                 %s     stepid of the running job.

                 %N     short  hostname.  This  will create a separate IO file
                        per node.

                 %n     Node identifier relative to current job (e.g.  "0"  is
                        the  first node of the running job) This will create a
                        separate IO file per node.

                 %t     task identifier (rank) relative to current  job.  This
                        will create a separate IO file per task.

                 A  number  placed  between  the  percent character and format
                 specifier may be used  to  zero-pad  the  result  in  the  IO
                 filename.  This  number  is  ignored  if the format specifier
                 corresponds to  non-numeric data (%N for example).

                 Some examples of how the format string may be used  for  a  4
                 task  job  step  with  a  Job  ID of 128 and step id of 0 are
                 included below:

                 job%J.out      job128.0.out

                 job%4j.out     job0128.out

                 job%j-%2t.out  job128-00.out, job128-01.out, ...

INPUT ENVIRONMENT VARIABLES

       Some  srun  options  may  be  set  via  environment  variables.   These
       environment  variables,  along  with  their  corresponding options, are
       listed below.  Note: Command line options will  always  override  these
       settings.

       PMI_FANOUT            This  is  used  exclusively  with PMI (MPICH2 and
                             MVAPICH2)  and  controls  the  fanout   of   data
                             communications.  The  srun command sends messages
                             to application programs (via the PMI library) and
                             those  applications may be called upon to forward
                             that data to up  to  this  number  of  additional
                             tasks.  Higher  values offload work from the srun
                             command to the applications and  likely  increase
                             the vulnerability to failures.  The default value
                             is 32.

       PMI_FANOUT_OFF_HOST   This is used exclusively  with  PMI  (MPICH2  and
                             MVAPICH2)   and   controls  the  fanout  of  data
                             communications.  The srun command sends  messages
                             to application programs (via the PMI library) and
                             those applications may be called upon to  forward
                             that  data  to additional tasks. By default, srun
                             sends one message per host and one task  on  that
                             host  forwards  the  data  to other tasks on that
                             host up to PMI_FANOUT.  If PMI_FANOUT_OFF_HOST is
                             defined, the user task may be required to forward
                             the  data  to  tasks  on  other  hosts.   Setting
                             PMI_FANOUT_OFF_HOST   may  increase  performance.
                             Since more work is performed by the  PMI  library
                             loaded by the user application, failures also can
                             be more common and more difficult to diagnose.

       PMI_TIME              This is used exclusively  with  PMI  (MPICH2  and
                             MVAPICH2)    and    controls    how    much   the
                             communications from the tasks  to  the  srun  are
                             spread out in time in order to avoid overwhelming
                             the srun command with work. The default value  is
                             500  (microseconds)  per task. On relatively slow
                             processors or systems with very  large  processor
                             counts  (and  large PMI data sets), higher values
                             may be required.

       SLURM_CONF            The location of the SLURM configuration file.

       SLURM_ACCOUNT         Same as -A, --account

       SLURM_ACCTG_FREQ      Same as --acctg-freq

       SLURM_CHECKPOINT      Same as --checkpoint

       SLURM_CHECKPOINT_DIR  Same as --checkpoint-dir

       SLURM_CONN_TYPE       Same as --conn-type

       SLURM_CORE_FORMAT     Same as --core

       SLURM_CPU_BIND        Same as --cpu_bind

       SLURM_CPUS_PER_TASK   Same as -c, --ncpus-per-task

       SLURM_DEBUG           Same as -v, --verbose

       SLURMD_DEBUG          Same as -d, --slurmd-debug

       SLURM_DEPENDENCY      -P, --dependency=<jobid>

       SLURM_DISABLE_STATUS  Same as -X, --disable-status

       SLURM_DIST_PLANESIZE  Same as -m plane

       SLURM_DISTRIBUTION    Same as -m, --distribution

       SLURM_EPILOG          Same as --epilog

       SLURM_EXCLUSIVE       Same as --exclusive

       SLURM_EXIT_ERROR      Specifies the exit code generated  when  a  SLURM
                             error occurs (e.g. invalid options).  This can be
                             used by a script to distinguish application  exit
                             codes  from various SLURM error conditions.  Also
                             see SLURM_EXIT_IMMEDIATE.

       SLURM_EXIT_IMMEDIATE  Specifies  the  exit  code  generated  when   the
                             --immediate  option is used and resources are not
                             currently available.   This  can  be  used  by  a
                             script to distinguish application exit codes from
                             various  SLURM  error   conditions.    Also   see
                             SLURM_EXIT_ERROR.

       SLURM_GEOMETRY        Same as -g, --geometry

       SLURM_JOB_NAME        Same  as -J, --job-name except within an existing
                             allocation, in which case it is ignored to  avoid
                             using  the  batch  job’s name as the name of each
                             job step.

       SLURM_LABELIO         Same as -l, --label

       SLURM_MEM_BIND        Same as --mem_bind

       SLURM_NETWORK         Same as --network

       SLURM_NNODES          Same as -N, --nodes

       SLURM_NTASKS_PER_CORE Same as --ntasks-per-core

       SLURM_NTASKS_PER_NODE Same as --ntasks-per-node

       SLURM_NTASKS_PER_SOCKET
                             Same as --ntasks-per-socket

       SLURM_NO_ROTATE       Same as -R, --no-rotate

       SLURM_NPROCS          Same as -n, --ntasks

       SLURM_OPEN_MODE       Same as --open-mode

       SLURM_OVERCOMMIT      Same as -O, --overcommit

       SLURM_PARTITION       Same as -p, --partition

       SLURM_PROLOG          Same as --prolog

       SLURM_QOS             Same as --qos

       SLURM_REMOTE_CWD      Same as -D, --chdir=

       SLURM_RESTART_DIR     Same as --restart-dir

       SLURM_SIGNAL          Same as --signal

       SLURM_STDERRMODE      Same as -e, --error

       SLURM_STDINMODE       Same as -i, --input

       SLURM_STDOUTMODE      Same as -o, --output

       SLURM_TASK_EPILOG     Same as --task-epilog

       SLURM_TASK_PROLOG     Same as --task-prolog

       SLURM_THREADS         Same as -T, --threads

       SLURM_TIMELIMIT       Same as -t, --time

       SLURM_UNBUFFEREDIO    Same as -u, --unbuffered

       SLURM_WAIT            Same as -W, --wait

       SLURM_WCKEY           Same as -W, --wckey

       SLURM_WORKING_DIR     -D, --chdir

OUTPUT ENVIRONMENT VARIABLES

       srun will set some environment variables  in  the  environment  of  the
       executing  tasks  on  the  remote  compute  nodes.   These  environment
       variables are:

       BASIL_RESERVATION_ID  The  reservation  ID  on  Cray  systems   running
                             ALPS/BASIL only.

       SLURM_CHECKPOINT_IMAGE_DIR
                             Directory  into which checkpoint images should be
                             written if specified on the execute line.

       SLURM_CPU_BIND_VERBOSE
                             --cpu_bind verbosity (quiet,verbose).

       SLURM_CPU_BIND_TYPE   --cpu_bind type (none,rank,map_cpu:,mask_cpu:)

       SLURM_CPU_BIND_LIST   --cpu_bind map or mask  list  (<list  of  IDs  or
                             masks for this node>)

       SLURM_CPUS_ON_NODE    Count  of processors available to the job on this
                             node.  Note the  select/linear  plugin  allocates
                             entire  nodes to jobs, so the value indicates the
                             total  count  of   CPUs   on   the   node.    The
                             select/cons_res   plugin   allocates   individual
                             processors to jobs, so this number indicates  the
                             number  of  processors  on this node allocated to
                             the job.

       SLURM_GTIDS           Global task  IDs  running  on  this  node.   Zero
                             origin and comma separated.

       SLURM_JOB_DEPENDENCY  Set to value of the --dependency option.

       SLURM_JOB_ID (and SLURM_JOBID for backwards compatibility)
                             Job id of the executing job

       SLURM_LAUNCH_NODE_IPADDR
                             IP address of the node from which the task launch
                             was initiated (where the srun command ran from)

       SLURM_LOCALID         Node local task ID for the process within a job

       SLURM_MEM_BIND_VERBOSE
                             --mem_bind verbosity (quiet,verbose).

       SLURM_MEM_BIND_TYPE   --mem_bind type (none,rank,map_mem:,mask_mem:)

       SLURM_MEM_BIND_LIST   --mem_bind map or mask  list  (<list  of  IDs  or
                             masks for this node>)

       SLURM_NNODES          Total  number  of  nodes  in  the  job’s resource
                             allocation

       SLURM_NODEID          The relative node ID of the current node

       SLURM_NODELIST        List of nodes allocated to the job

       SLURM_NPROCS          Total number of processes in the current job

       SLURM_PRIO_PROCESS    The scheduling priority (nice value) at the  time
                             of  job  submission.  This value is propagated to
                             the spawned processes.

       SLURM_PROCID          The MPI rank (or  relative  process  ID)  of  the
                             current process

       SLURM_STEPID          The step ID of the current job

       SLURM_SUBMIT_DIR      The directory from which srun was invoked.

       SLURM_TASK_PID        The process ID of the task being started.

       SLURM_TASKS_PER_NODE  Number  of  tasks  to  be initiated on each node.
                             Values are comma separated and in the same  order
                             as  SLURM_NODELIST.   If  two or more consecutive
                             nodes are to have the same task count, that count
                             is followed by "(x#)" where "#" is the repetition
                             count.                For                example,
                             "SLURM_TASKS_PER_NODE=2(x3),1" indicates that the
                             first three nodes will each execute  three  tasks
                             and the fourth node will execute one task.

       MPIRUN_NOALLOCATE     Do  not  allocate  a  block  on Blue Gene systems
                             only.

       MPIRUN_NOFREE         Do not free a block on Blue Gene systems only.

       MPIRUN_PARTITION      The block name on Blue Gene systems only.

SIGNALS AND ESCAPE SEQUENCES

       Signals sent to the srun command are  automatically  forwarded  to  the
       tasks  it  is  controlling  with  a few exceptions. The escape sequence
       <control-c> will report the state of all tasks associated with the srun
       command.  If  <control-c>  is entered twice within one second, then the
       associated SIGINT signal will be sent to all tasks  and  a  termination
       sequence  will  be entered sending SIGCONT, SIGTERM, and SIGKILL to all
       spawned tasks.  If a third <control-c> is received,  the  srun  program
       will  be  terminated  without waiting for remote tasks to exit or their
       I/O to complete.

       The escape sequence <control-z> is presently ignored. Our intent is for
       this put the srun command into a mode where various special actions may
       be invoked.

MPI SUPPORT

       MPI use depends upon the type of  MPI  being  used.   There  are  three
       fundamentally  different  modes  of operation used by these various MPI
       implementation.

       1. SLURM directly launches the tasks  and  performs  initialization  of
       communications  (Quadrics  MPI, MPICH2, MPICH-GM, MVAPICH, MVAPICH2 and
       some MPICH1 modes). For example: "srun -n16 a.out".

       2. SLURM creates a resource allocation for  the  job  and  then  mpirun
       launches  tasks  using SLURM’s infrastructure (OpenMPI, LAM/MPI, HP-MPI
       and some MPICH1 modes).

       3. SLURM creates a resource allocation for  the  job  and  then  mpirun
       launches  tasks  using  some mechanism other than SLURM, such as SSH or
       RSH (BlueGene MPI  and  some  MPICH1  modes).   These  tasks  initiated
       outside  of  SLURM’s  monitoring  or  control. SLURM’s epilog should be
       configured  to  purge  these  tasks  when  the  job’s   allocation   is
       relinquished.

       See   https://computing.llnl.gov/linux/slurm/mpi_guide.html   for  more
       information on use of these various MPI implementation with SLURM.

MULTIPLE PROGRAM CONFIGURATION

       Comments in the configuration file must have a "#" in column one.   The
       configuration  file  contains  the  following fields separated by white
       space:

       Task rank
              One or more task ranks  to  use  this  configuration.   Multiple
              values may be comma separated.  Ranges may be indicated with two
              numbers separated with a ’-’ with the smaller number first (e.g.
              "0-4"  and not "4-0").  To indicate all tasks, specify a rank of
              ’*’ (in which  case  you  probably  should  not  be  using  this
              option).   If an attempt is made to initiate a task for which no
              executable program is defined, the following error message  will
              be produced "No executable program specified for this task".

       Executable
              The  name  of  the  program  to execute.  May be fully qualified
              pathname if desired.

       Arguments
              Program arguments.  The expression "%t" will  be  replaced  with
              the  task’s  number.   The expression "%o" will be replaced with
              the task’s offset within this range (e.g. a configured task rank
              value  of  "1-5"  would  have  offset  values of "0-4").  Single
              quotes  may  be  used  to  avoid  having  the  enclosed   values
              interpreted.  This field is optional.

       For example:
       ###################################################################
       # srun multiple program configuration file
       #
       # srun -n8 -l --multi-prog silly.conf
       ###################################################################
       4-6       hostname
       1,7       echo  task:%t
       0,2-3     echo  offset:%o

       > srun -n8 -l --multi-prog silly.conf
       0: offset:0
       1: task:1
       2: offset:1
       3: offset:2
       4: linux15.llnl.gov
       5: linux16.llnl.gov
       6: linux17.llnl.gov
       7: task:7

EXAMPLES

       This  simple example demonstrates the execution of the command hostname
       in eight tasks. At least eight processors will be allocated to the  job
       (the  same  as  the  task  count) on however many nodes are required to
       satisfy the request. The output of each task will be proceeded with its
       task  number.   (The  machine "dev" in the example below has a total of
       two CPUs per node)

       > srun -n8 -l hostname
       0: dev0
       1: dev0
       2: dev1
       3: dev1
       4: dev2
       5: dev2
       6: dev3
       7: dev3

       The output of test.sh  would  be  found  in  the  default  output  file
       "slurm-42.out."

       The  srun -r option is used within a job script to run two job steps on
       disjoint nodes in the  following  example.  The  script  is  run  using
       allocate mode instead of as a batch job in this case.

       > cat test.sh
       #!/bin/sh
       echo $SLURM_NODELIST
       srun -lN2 -r2 hostname
       srun -lN2 hostname

       > salloc -N4 test.sh
       dev[7-10]
       0: dev9
       1: dev10
       0: dev7
       1: dev8

       The  follwing script runs two job steps in parallel within an allocated
       set of nodes.

       > cat test.sh
       #!/bin/bash
       srun -lN2 -n4 -r 2 sleep 60 &
       srun -lN2 -r 0 sleep 60 &
       sleep 1
       squeue
       squeue -s
       wait

       > salloc -N4 test.sh
         JOBID PARTITION     NAME     USER  ST      TIME  NODES NODELIST
         65641     batch  test.sh   grondo   R      0:01      4 dev[7-10]

       STEPID     PARTITION     USER      TIME NODELIST
       65641.0        batch   grondo      0:01 dev[7-8]
       65641.1        batch   grondo      0:01 dev[9-10]

       This example demonstrates how one executes a simple MPICH job.  We  use
       srun  to  build  a list of machines (nodes) to be used by mpirun in its
       required format. A sample command line and the script  to  be  executed
       follow.

       > cat test.sh
       #!/bin/sh
       MACHINEFILE="nodes.$SLURM_JOB_ID"

       # Generate Machinefile for mpich such that hosts are in the same
       #  order as if run via srun
       #
       srun -l /bin/hostname | sort -n | awk ’{print $2}’ > $MACHINEFILE

       # Run using generated Machine file:
       mpirun -np $SLURM_NPROCS -machinefile $MACHINEFILE mpi-app

       rm $MACHINEFILE

       > salloc -N2 -n4 test.sh

       This  simple  example  demonstrates  the execution of different jobs on
       different nodes in the same srun.  You can do this for  any  number  of
       nodes  or  any number of jobs.  The executables are placed on the nodes
       sited by the SLURM_NODEID env var.  Starting at  0  and  going  to  the
       number specified on the srun commandline.

       > cat test.sh
       case $SLURM_NODEID in
           0) echo "I am running on "
              hostname ;;
           1) hostname
              echo "is where I am running" ;;
       esac

       > srun -N2 test.sh
       dev0
       is where I am running
       I am running on
       dev1

       This  example  demonstrates use of multi-core options to control layout
       of tasks.  We request that four sockets per  node  and  two  cores  per
       socket be dedicated to the job.

       > srun -N2 -B 4-4:2-2 a.out

       This  example shows a script in which Slurm is used to provide resource
       management for a job by executing the various job steps  as  processors
       become available for their dedicated use.

       > cat my.script
       #!/bin/bash
       srun --exclusive -n4 prog1 &
       srun --exclusive -n3 prog2 &
       srun --exclusive -n1 prog3 &
       srun --exclusive -n1 prog4 &
       wait

COPYING

       Copyright  (C)  2006-2007  The Regents of the University of California.
       Copyright (C) 2008-2009 Lawrence Livermore National Security.  Produced
       at   Lawrence   Livermore   National   Laboratory   (cf,   DISCLAIMER).
       CODE-OCEC-09-009. All rights reserved.

       This file is  part  of  SLURM,  a  resource  management  program.   For
       details, see <https://computing.llnl.gov/linux/slurm/>.

       SLURM  is free software; you can redistribute it and/or modify it under
       the terms of the GNU General Public License as published  by  the  Free
       Software  Foundation;  either  version  2  of  the License, or (at your
       option) any later version.

       SLURM is distributed in the hope that it will be  useful,  but  WITHOUT
       ANY  WARRANTY;  without even the implied warranty of MERCHANTABILITY or
       FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General  Public  License
       for more details.

SEE ALSO

       salloc(1),  sattach(1),  sbatch(1), sbcast(1), scancel(1), scontrol(1),
       squeue(1), slurm.conf(5), sched_setaffinity(2), numa(3) getrlimit(2),