NAME
scontrol - Used view and modify Slurm configuration and state.
SYNOPSIS
scontrol [OPTIONS...] [COMMAND...]
DESCRIPTION
scontrol is used to view or modify Slurm configuration including: job,
job step, node, partition, reservation, and overall system
configuration. Most of the commands can only be executed by user root.
If an attempt to view or modify configuration information is made by an
unauthorized user, an error message will be printed and the requested
action will not occur. If no command is entered on the execute line,
scontrol will operate in an interactive mode and prompt for input. It
will continue prompting for input and executing commands until
explicitly terminated. If a command is entered on the execute line,
scontrol will execute that command and terminate. All commands and
options are case-insensitive, although node names, partition names, and
reservation names are case-sensitive (node names "LX" and "lx" are
distinct). All commands and options can be abbreviated to the extent
that the specification is unique.
OPTIONS
-a, --all
When the show command is used, then display all partitions,
their jobs and jobs steps. This causes information to be
displayed about partitions that are configured as hidden and
partitions that are unavailable to user’s group.
-d, --detail
Causes the show command to provide additional details where
available.
-h, --help
Print a help message describing the usage of scontrol.
--hide Do not display information about hidden partitions, their jobs
and job steps. By default, neither partitions that are
configured as hidden nor those partitions unavailable to user’s
group will be displayed (i.e. this is the default behavior).
-o, --oneliner
Print information one line per record.
-Q, --quiet
Print no warning or informational messages, only fatal error
messages.
-v, --verbose
Print detailed event logging. Multiple -v’s will further
increase the verbosity of logging. By default only errors will
be displayed.
-V , --version
Print version information and exit.
COMMANDS
all Show all partitions, their jobs and jobs steps. This causes
information to be displayed about partitions that are configured
as hidden and partitions that are unavailable to user’s group.
abort Instruct the Slurm controller to terminate immediately and
generate a core file. See "man slurmctld" for information about
where the core file will be written.
checkpoint CKPT_OP ID
Perform a checkpoint activity on the job step(s) with the
specified identification. ID can be used to identify a specific
job (e.g. "<job_id>", which applies to all of its existing
steps) or a specific job step (e.g. "<job_id>.<step_id>").
Acceptable values for CKPT_OP include:
disable (disable future checkpoints)
enable (enable future checkpoints)
able (test if presently not disabled, report start time if
checkpoint in progress)
create (create a checkpoint and continue the job step)
vacate (create a checkpoint and terminate the job step)
error (report the result for the last checkpoint request, error
code and message)
restart (restart execution of the previously checkpointed job
steps)
Acceptable values for CKPT_OP include:
MaxWait=<seconds> maximum time for checkpoint to be written.
Default value is 10 seconds. Valid with create and
vacate options only.
ImageDir=<directory_name> Location of checkpoint file.
Valid with create, vacate and restart options only. This
value takes precedent over any --checkpoint-dir value
specified at job submission time.
StickToNodes If set, resume job on the same nodes are previously
used.
Valid with the restart option only.
create SPECIFICATION
Create a new partition or reservation. See the full list of
parameters below. Include the tag "res" to create a reservation
without specifying a reservation name.
completing
Display all jobs in a COMPLETING state along with associated
nodes in either a COMPLETING or DOWN state.
delete SPECIFICATION
Delete the entry with the specified SPECIFICATION. The two
SPECIFICATION choices are PartitionName=<name> and
Reservation=<name>. On Dynamically laid out Bluegene systems
BlockName=<name> also works.
detail Causes the show command to provide additional details where
available, namely the specific CPUs and NUMA memory allocated on
each node. Note that on computers with hyperthreading enabled
and SLURM configured to allocate cores, each listed CPU
represents one physical core. Each hyperthread on that core can
be allocated a separate task, so a job’s CPU count and task
count may differ. See the --cpu_bind and --mem_bind option
descriptions in srun man pages for more information. The detail
option is currently only supported for the show job command.
exit Terminate the execution of scontrol. This is an independent
command with no options meant for use in interactive mode.
help Display a description of scontrol options and commands.
hide Do not display partition, job or jobs step information for
partitions that are configured as hidden or partitions that are
unavailable to the user’s group. This is the default behavior.
notify job_id message
Send a message to standard error of the srun command associated
with the specified job_id.
oneliner
Print information one line per record.
pidinfo proc_id
Print the Slurm job id and scheduled termination time
corresponding to the supplied process id, proc_id, on the
current node. This will work only with processes on node on
which scontrol is run, and only for those processes spawned by
SLURM and their descendants.
listpids [job_id[.step_id]] [NodeName]
Print a listing of the process IDs in a job step (if
JOBID.STEPID is provided), or all of the job steps in a job (if
job_id is provided), or all of the job steps in all of the jobs
on the local node (if job_id is not provided or job_id is "*").
This will work only with processes on the node on which scontrol
is run, and only for those processes spawned by SLURM and their
descendants. Note that some SLURM configurations (ProctrackType
value of pgid or aix) are unable to identify all processes
associated with a job or job step.
Note that the NodeName option is only really useful when you
have multiple slurmd daemons running on the same host machine.
Multiple slurmd daemons on one host are, in general, only used
by SLURM developers.
ping Ping the primary and secondary slurmctld daemon and report if
they are responding.
quiet Print no warning or informational messages, only fatal error
messages.
quit Terminate the execution of scontrol.
reconfigure
Instruct all Slurm daemons to re-read the configuration file.
This command does not restart the daemons. This mechanism would
be used to modify configuration parameters (Epilog, Prolog,
SlurmctldLogFile, SlurmdLogFile, etc.) register the physical
addition or removal of nodes from the cluster or recognize the
change of a node’s configuration, such as the addition of memory
or processors. The Slurm controller (slurmctld) forwards the
request all other daemons (slurmd daemon on each compute node).
Running jobs continue execution. Most configuration parameters
can be changed by just running this command, however, SLURM
daemons should be shutdown and restarted if any of these
parameters are to be changed: AuthType, BackupAddr,
BackupController, ControlAddr, ControlMach, PluginDir,
StateSaveLocation, SlurmctldPort or SlurmdPort.
resume job_id
Resume a previously suspended job.
requeue job_id
Requeue a running or pending SLURM batch job.
setdebug LEVEL
Change the debug level of the slurmctld daemon. LEVEL may be an
integer value between zero and nine (using the same values as
SlurmctldDebug in the slurm.conf file) or the name of the most
detailed message type to be printed: "quiet", "fatal", "error",
"info", "verbose", "debug", "debug2", "debug3", "debug4", or
"debug5". This value is temporary and will be overwritten
whenever the slurmctld daemon reads the slurm.conf configuration
file (e.g. when the daemon is restarted or scontrol reconfigure
is executed).
show ENTITY ID
Display the state of the specified entity with the specified
identification. ENTITY may be config, daemons, job, node,
partition, reservation, slurmd, step, topology, hostlist or
hostnames (also block or subbp on BlueGene systems). ID can be
used to identify a specific element of the identified entity:
the configuration parameter name, job ID, node name, partition
name, reservation name, or job step ID for config, job, node,
partition, or step respectively. For an ENTITY of topology, the
ID may be a node or switch name. If one node name is specified,
all switches connected to that node (and their parent switches)
will be shown. If more than one node name is specified, only
switches that connect to all named nodes will be shown.
hostnames takes an optional hostlist expression as input and
writes a list of individual host names to standard output (one
per line). If no hostlist expression is supplied, the contents
of the SLURM_NODELIST environment variable is used. For example
"tux[1-3]" is mapped to "tux1","tux2" and "tux3" (one hostname
per line). hostlist takes a list of host names and prints the
hostlist expression for them (the inverse of hostnames).
hostlist can also take the absolute pathname of a file
(beginning with the character ’/’) containing a list of
hostnames. Multiple node names may be specified using simple
node range expressions (e.g. "lx[10-20]"). All other ID values
must identify a single element. The job step ID is of the form
"job_id.step_id", (e.g. "1234.1"). slurmd reports the current
status of the slurmd daemon executing on the same node from
which the scontrol command is executed (the local host). It can
be useful to diagnose problems. By default, all elements of the
entity type specified are printed.
shutdown OPTION
Instruct Slurm daemons to save current state and terminate. By
default, the Slurm controller (slurmctld) forwards the request
all other daemons (slurmd daemon on each compute node). An
OPTION of slurmctld or controller results in only the slurmctld
daemon being shutdown and the slurmd daemons remaining active.
suspend job_id
Suspend a running job. Use the resume command to resume its
execution. User processes must stop on receipt of SIGSTOP
signal and resume upon receipt of SIGCONT for this operation to
be effective. Not all architectures and configurations support
job suspension.
takeover
Instruct SLURM’s backup controller (slurmctld) to take over
system control. SLURM’s backup controller requests control from
the primary and waits for its termination. After that, it
switches from backup mode to controller mode. If primary
controller can not be contacted, it directly switches to
controller mode. This can be used to speed up the SLURM
controller fail-over mechanism when the primary node is down.
This can be used to minimize disruption if the computer
executing the primary SLURM controller is scheduled down.
(Note: SLURM’s primary controller will take the control back at
startup.)
update SPECIFICATION
Update job, node, partition, or reservation configuration per
the supplied specification. SPECIFICATION is in the same format
as the Slurm configuration file and the output of the show
command described above. It may be desirable to execute the show
command (described above) on the specific entity you which to
update, then use cut-and-paste tools to enter updated
configuration values to the update. Note that while most
configuration values can be changed using this command, not all
can be changed using this mechanism. In particular, the hardware
configuration of a node or the physical addition or removal of
nodes from the cluster may only be accomplished through editing
the Slurm configuration file and executing the reconfigure
command (described above).
verbose
Print detailed event logging. This includes time-stamps on data
structures, record counts, etc.
version
Display the version number of scontrol being executed.
!! Repeat the last command executed.
SPECIFICATIONS FOR UPDATE COMMAND, JOBS
Account=<account>
Account name to be changed for this job’s resource use. Value
may be cleared with blank data value, "Account=".
Conn-Type=<type>
Reset the node connection type. Possible values on Blue Gene
are "MESH", "TORUS" and "NAV" (mesh else torus).
Contiguous=<yes|no>
Set the job’s requirement for contiguous (consecutive) nodes to
be allocated. Possible values are "YES" and "NO".
Dependency=<dependency_list>
Defer job’s initiation until specified job dependency
specification is satisfied. Cancel dependency with an empty
dependency_list (e.g. "Dependency="). <dependency_list> is of
the form <type:job_id[:job_id][,type:job_id[:job_id]]>. Many
jobs can share the same dependency and these jobs may even
belong to different users.
after:job_id[:jobid...]
This job can begin execution after the specified jobs
have begun execution.
afterany:job_id[:jobid...]
This job can begin execution after the specified jobs
have terminated.
afternotok:job_id[:jobid...]
This job can begin execution after the specified jobs
have terminated in some failed state (non-zero exit code,
node failure, timed out, etc).
afterok:job_id[:jobid...]
This job can begin execution after the specified jobs
have successfully executed (ran to completion with non-
zero exit code).
singleton
This job can begin execution after any previously
launched jobs sharing the same job name and user have
terminated.
EligibleTime=<time_spec>
See StartTime.
ExcNodeList=<nodes>
Set the job’s list of excluded node. Multiple node names may be
specified using simple node range expressions (e.g.
"lx[10-20]"). Value may be cleared with blank data value,
"ExcNodeList=".
Features=<features>
Set the job’s required node features. The list of features may
include multiple feature names separated by ampersand (AND)
and/or vertical bar (OR) operators. For example:
Features="opteron&video" or Features="fast|faster". In the
first example, only nodes having both the feature "opteron" AND
the feature "video" will be used. There is no mechanism to
specify that you want one node with feature "opteron" and
another node with feature "video" in case no node has both
features. If only one of a set of possible options should be
used for all allocated nodes, then use the OR operator and
enclose the options within square brackets. For example:
"Features=[rack1|rack2|rack3|rack4]" might be used to specify
that all nodes must be allocated on a single rack of the
cluster, but any of those four racks can be used. A request can
also specify the number of nodes needed with some feature by
appending an asterisk and count after the feature name. For
example "Features=graphics*4" indicates that at least four
allocated nodes must have the feature "graphics." Constraints
with node counts may only be combined with AND operators. Value
may be cleared with blank data value, for example "Features=".
Geometry=<geo>
Reset the required job geometry. On Blue Gene the value should
be three digits separated by "x" or ",". The digits represent
the allocation size in X, Y and Z dimensions (e.g. "2x3x4").
JobId=<id>
Identify the job to be updated. This specification is required.
Licenses=<name>
Specification of licenses (or other resources available on all
nodes of the cluster) as described in salloc/sbatch/srun man
pages.
MinCPUsNode=<count>
Set the job’s minimum number of CPUs per node to the specified
value.
MinMemoryCPU=<megabytes>
Set the job’s minimum real memory required per allocated CPU to
the specified value. Either MinMemoryCPU or MinMemoryNode may
be set, but not both.
MinMemoryNode=<megabytes>
Set the job’s minimum real memory required per node to the
specified value. Either MinMemoryCPU or MinMemoryNode may be
set, but not both.
MinTmpDiskNode=<megabytes>
Set the job’s minimum temporary disk space required per node to
the specified value.
Name=<name>
Set the job’s name to the specified value.
Nice[=delta]
Adjust job’s priority by the specified value. Default value is
100. The adjustment range is from -10000 (highest priority) to
10000 (lowest priority). Nice value changes are not additive,
but overwrite any prior nice value and are applied to the job’s
base priority. Only privileged users can specify a negative
adjustment.
NumNodes=<min_count>[-<max_count>]
Set the job’s minimum and optionally maximum count of nodes to
be allocated.
NumTasks=<count>
Set the job’s count of required tasks to the specified value.
Partition=<name>
Set the job’s partition to the specified value.
Priority=<number>
Set the job’s priority to the specified value. Note that a job
priority of zero prevents the job from ever being scheduled. By
setting a job’s priority to zero it is held. Set the priority
to a non-zero value to permit it to run. Explicitly setting a
job’s priority clears any previously set nice value.
ReqCores=<count>
Set the job’s count of minimum cores per socket to the specified
value.
ReqNodeList=<nodes>
Set the job’s list of required node. Multiple node names may be
specified using simple node range expressions (e.g.
"lx[10-20]"). Value may be cleared with blank data value,
"ReqNodeList=".
ReqSockets=<count>
Set the job’s count of minimum sockets per node to the specified
value.
ReqThreads=<count>
Set the job’s count of minimum threads per core to the specified
value.
Requeue=<0|1>
Stipulates whether a job should be requeued after a node
failure: 0 for no, 1 for yes.
ReservationName=<name>
Set the job’s reservation to the specified value.
Rotate=<yes|no>
Permit the job’s geometry to be rotated. Possible values are
"YES" and "NO".
Shared=<yes|no>
Set the job’s ability to share nodes with other jobs. Possible
values are "YES" and "NO".
StartTime=<time_spec>
Set the job’s earliest initiation time. It accepts times of the
form HH:MM:SS to run a job at a specific time of day (seconds
are optional). (If that time is already past, the next day is
assumed.) You may also specify midnight, noon, or teatime (4pm)
and you can have a time-of-day suffixed with AM or PM for
running in the morning or the evening. You can also say what
day the job will be run, by specifying a date of the form MMDDYY
or MM/DD/YY or MM.DD.YY, or a date and time as
YYYY-MM-DD[THH:MM[:SS]]. You can also give times like now +
count time-units, where the time-units can be minutes, hours,
days, or weeks and you can tell SLURM to run the job today with
the keyword today and to run the job tomorrow with the keyword
tomorrow.
Notes on date/time specifications:
- although the ’seconds’ field of the HH:MM:SS time
specification is allowed by the code, note that the poll time of
the SLURM scheduler is not precise enough to guarantee dispatch
of the job on the exact second. The job will be eligible to
start on the next poll following the specified time. The exact
poll interval depends on the SLURM scheduler (e.g., 60 seconds
with the default sched/builtin).
- if no time (HH:MM:SS) is specified, the default is
(00:00:00).
- if a date is specified without a year (e.g., MM/DD) then the
current year is assumed, unless the combination of MM/DD and
HH:MM:SS has already passed for that year, in which case the
next year is used.
TimeLimit=<time>
The job’s time limit. Output format is
[days-]hours:minutes:seconds or "UNLIMITED". Input format (for
update command) set is minutes, minutes:seconds,
hours:minutes:seconds, days-hours, days-hours:minutes or
days-hours:minutes:seconds. Time resolution is one minute and
second values are rounded up to the next minute.
WCKey=<key>
Set the job’s workload characterization key to the specified
value.
NOTE: The "show" command, when used with the "job" or "job <jobid>"
entity displays detailed information about a job or jobs. Much
of this information may be modified using the "update job"
command as described above. However, the following fields
displayed by the show job command are read-only and cannot be
modified:
AllocNode:Sid
Local node and system id making the resource allocation.
EndTime
The time the job is expected to terminate based on the job’s
time limit. When the job ends sooner, this field will be
updated with the actual end time.
ExitCode=<exit>:<sig>
Exit status reported for the job by the wait() function. The
first number is the exit code, typically as set by the exit()
function. The second number of the signal that caused the
process to terminate if it was terminated by a signal.
JobState
The current state of the job.
NodeList
The list of nodes allocated to the job.
NodeListIndices
The NodeIndices expose the internal indices into the node table
associated with the node(s) allocated to the job.
PreSusTime
Time the job ran prior to last suspend.
Reason The reason job is not running: e.g., waiting "Resources".
SuspendTime
Time the job was last suspended or resumed.
UserId GroupId
The user and group under which the job was submitted.
NOTE on information displayed for various job states:
When you submit a request for the "show job" function the
scontrol process makes an RPC request call to slurmctld with a
REQUEST_JOB_INFO message type. If the state of the job is
PENDING, then it returns some detail information such as:
min_nodes, min_procs, cpus_per_task, etc. If the state is other
than PENDING the code assumes that it is in a further state such
as RUNNING, COMPLETE, etc. In these cases the code explicitly
returns zero for these values. These values are meaningless once
the job resources have been allocated and the job has started.
SPECIFICATIONS FOR UPDATE COMMAND, NODES
NodeName=<name>
Identify the node(s) to be updated. Multiple node names may be
specified using simple node range expressions (e.g.
"lx[10-20]"). This specification is required.
Features=<features>
Identify feature(s) to be associated with the specified node.
Any previously defined feature(s) will be overwritten with the
new value. NOTE: Features assigned via scontrol do not survive
the restart of the slurmctld nor will they survive scontrol
reconfigure if Features are defined in slurm.conf. Update
slurm.conf with any changes meant to be persistent.
Reason=<reason>
Identify the reason the node is in a "DOWN" or "DRAINED",
"DRAINING", "FAILING" or "FAIL" state. Use quotes to enclose a
reason having more than one word.
State=<state>
Identify the state to be assigned to the node. Possible values
are "NoResp", "ALLOC", "ALLOCATED", "DOWN", "DRAIN", "FAIL",
"FAILING", "IDLE", "MIXED", "MAINT", "POWER_DOWN", "POWER_UP",
or "RESUME". If a node is in a "MIXED" state it usually means
the node is in multiple states. For instance if only part of
the node is "ALLOCATED" and the rest of the node is "IDLE" the
state will be "MIXED". If you want to remove a node from
service, you typically want to set it’s state to "DRAIN".
"FAILING" is similar to "DRAIN" except that some applications
will seek to relinquish those nodes before the job completes.
"RESUME" is not an actual node state, but will return a
"DRAINED", "DRAINING", or "DOWN" node to service, either "IDLE"
or "ALLOCATED" state as appropriate. Setting a node "DOWN" will
cause all running and suspended jobs on that node to be
terminated. "POWER_DOWN" and "POWER_UP" will use the configured
SuspendProg and ResumeProg programs to explicitly place a node
in or out of a power saving mode. The "NoResp" state will only
set the "NoResp" flag for a node without changing its underlying
state. While all of the above states are valid, some of them
are not valid new node states given their prior state.
Generally only "DRAIN", "FAIL" and "RESUME" should be used.
Weight=<weight>
Identify weight to be associated with specified nodes. This
allows dynamic changes to weight associated with nodes, which
will be used for the subsequent node allocation decisions. Any
previously identified weight will be overwritten with the new
value.NOTE: The Weight associated with nodes will be reset to
the values specified in slurm.conf (if any) upon slurmctld
restart or reconfiguration. Update slurm.conf with any changes
meant to be persistent.
SPECIFICATIONS FOR CREATE, UPDATE, AND DELETE COMMANDS, PARTITIONS
AllowGroups=<name>
Identify the user groups which may use this partition. Multiple
groups may be specified in a comma separated list. To permit
all groups to use the partition specify "AllowGroups=ALL".
AllocNodes=<name>
Comma separated list of nodes from which users can execute jobs
in the partition. Node names may be specified using the node
range expression syntax described above. The default value is
"ALL".
Default=<yes|no>
Specify if this partition is to be used by jobs which do not
explicitly identify a partition to use. Possible output values
are "YES" and "NO". In order to change the default partition of
a running system, use the scontrol update command and set
Default=yes for the partition that you want to become the new
default.
DefaultTime=<time>
Run time limit used for jobs that don’t specify a value. If not
set then MaxTime will be used. Format is the same as for
MaxTime.
Hidden=<yes|no>
Specify if the partition and its jobs should be hidden from
view. Hidden partitions will by default not be reported by
SLURM APIs or commands. Possible values are "YES" and "NO".
MaxNodes=<count>
Set the maximum number of nodes which will be allocated to any
single job in the partition. Specify a number, "INFINITE" or
"UNLIMITED". (On a Bluegene type system this represents a
c-node count.)
MaxTime=<time>
The maximum run time for jobs. Output format is
[days-]hours:minutes:seconds or "UNLIMITED". Input format (for
update command) is minutes, minutes:seconds,
hours:minutes:seconds, days-hours, days-hours:minutes or
days-hours:minutes:seconds. Time resolution is one minute and
second values are rounded up to the next minute.
MinNodes=<count>
Set the minimum number of nodes which will be allocated to any
single job in the partition. (On a Bluegene type system this
represents a c-node count.)
Nodes=<name>
Identify the node(s) to be associated with this partition.
Multiple node names may be specified using simple node range
expressions (e.g. "lx[10-20]"). Note that jobs may only be
associated with one partition at any time. Specify a blank data
value to remove all nodes from a partition: "Nodes=".
PartitionName=<name>
Identify the partition to be updated. This specification is
required.
Priority=<count>
Jobs submitted to a higher priority partition will be dispatched
before pending jobs in lower priority partitions and if possible
they will preempt running jobs from lower priority partitions.
Note that a partition’s priority takes precedence over a job’s
priority. The value may not exceed 65533.
RootOnly=<yes|no>
Specify if only allocation requests initiated by user root will
be satisfied. This can be used to restrict control of the
partition to some meta-scheduler. Possible values are "YES" and
"NO".
Shared=<yes|no|exclusive|force>[:<job_count>]
Specify if nodes in this partition can be shared by multiple
jobs. Possible values are "YES", "NO", "EXCLUSIVE" and "FORCE".
An optional job count specifies how many jobs can be allocated
to use each resource.
State=<up|down>
Specify if jobs can be allocated nodes in this partition.
Possible values are"UP" and "DOWN". If a partition allocated
nodes to running jobs, those jobs will continue execution even
after the partition’s state is set to "DOWN". The jobs must be
explicitly canceled to force their termination.
SPECIFICATIONS FOR CREATE, UPDATE, AND DELETE COMMANDS, RESERVATIONS
Reservation=<name>
Identify the name of the reservation to be created,
updated, or deleted. This parameter is required for
update and is the only parameter for delete. For create,
if you do not want to give a reservation name, use
"scontrol create res ..." and a name will be created
automatically.
Licenses=<license>
Specification of licenses (or other resources available
on all nodes of the cluster) which are to be reserved.
License names can be followed by an asterisk and count
(the default count is one). Multiple license names
should be comma separated (e.g. "Licenses=foo*4,bar").
NodeCnt=<num>
Identify number of nodes to be reserved. A new
reservation must specify either NodeCnt or Nodes.
Nodes=<name>
Identify the node(s) to be reserved. Multiple node names
may be specified using simple node range expressions
(e.g. "Nodes=lx[10-20]"). Specify a blank data value to
remove all nodes from a reservation: "Nodes=". A new
reservation must specify either NodeCnt or Nodes.
StartTime=<time_spec>
The start time for the reservation. A new reservation
must specify a start time. It accepts times of the form
HH:MM:SS for a specific time of day (seconds are
optional). (If that time is already past, the next day
is assumed.) You may also specify midnight, noon, or
teatime (4pm) and you can have a time-of-day suffixed
with AM or PM for running in the morning or the evening.
You can also say what day the job will be run, by
specifying a date of the form MMDDYY or MM/DD/YY or
MM.DD.YY, or a date and time as YYYY-MM-DD[THH:MM[:SS]].
You can also give times like now + count time-units,
where the time-units can be minutes, hours, days, or
weeks and you can tell SLURM to run the job today with
the keyword today and to run the job tomorrow with the
keyword tomorrow.
EndTime=<time_spec>
The end time for the reservation. A new reservation must
specify an end time or a duration. Valid formats are the
same as for StartTime.
Duration=<time>
The length of a reservation. A new reservation must
specify an end time or a duration. Valid formats are
minutes, minutes:seconds, hours:minutes:seconds,
days-hours, days-hours:minutes,
days-hours:minutes:seconds, or UNLIMITED. Time
resolution is one minute and second values are rounded up
to the next minute.
PartitionName=<name>
Identify the partition to be reserved.
Flags=<flags>
Flags associated with the reservation. In order to
remove a flag with the update option, preceed the name
with a minus sign. For example: Flags=-DAILY (NOTE: this
option is not supported for all flags). Currently
supported flags include:
MAINT Maintenance mode, receives special accounting
treatment. This partition is permitted to
use resources that are already in another
reservation.
OVERLAP This reservation can be allocated resources
that are already in another reservation.
IGNORE_JOBS Ignore currently running jobs when creating
the reservation. This can be especially
useful when reserving all nodes in the system
for maintenance.
DAILY Repeat the reservation at the same time every
day
WEEKLY Repeat the reservation at the same time every
week
SPEC_NODES Reservation is for specific nodes (output
only)
Features=<features>
Set the reservation’s required node features. Multiple
values may be "&" separated if all features are required
(AND operation) or separated by "|" if any of the
specified features are required (OR operation). Value
may be cleared with blank data value, "Features=".
Users=<user list>
List of users permitted to use the reserved nodes. E.g.
Users=jones1,smith2. A new reservation must specify
Users and/or Accounts.
Accounts=<account list>
List of accounts permitted to use the reserved nodes.
E.g. Accounts=physcode1,physcode2. A user in any of the
accounts may use the reserved nodes. A new reservation
must specify Users and/or Accounts.
SPECIFICATIONS FOR UPDATE, BLOCK
Bluegene systems only!
BlockName=<name>
Identify the bluegene block to be updated. This
specification is required.
State=<free|error|remove>
This will update the state of a bluegene block to either
FREE or ERROR. (i.e. update BlockName=RMP0 STATE=ERROR)
State error will not allow jobs to run on the block.
WARNING!!!! This will cancel any running job on the
block! On dynamically laid out systems REMOVE will free
and remove the block from the system. If the block is
smaller than a midplane every block on that midplane will
be removed.
SubBPName=<name>
Identify the bluegene ionodes to be updated (i.e.
bg000[0-3]). This specification is required.
ENVIRONMENT VARIABLES
Some scontrol options may be set via environment variables.
These environment variables, along with their corresponding
options, are listed below. (Note: Commandline options will
always override these settings.)
SCONTROL_ALL -a, --all
SLURM_CONF The location of the SLURM configuration
file.
EXAMPLES
# scontrol
scontrol: show part debug
PartitionName=debug
AllocNodes=ALL AllowGroups=ALL Default=YES
DefaultTime=NONE DisableRootJobs=NO Hidden=NO
MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1
Nodes=snowflake[0-48]
Priority=1 RootOnly=NO Shared=YES:4
State=UP TotalCPUs=694 TotalNodes=49
scontrol: update PartitionName=debug MaxTime=60:00 MaxNodes=4
scontrol: show job 71701
JobId=71701 Name=hostname
UserId=da(1000) GroupId=da(1000)
Priority=66264 Account=none QOS=normal WCKey=*123
JobState=COMPLETED Reason=None Dependency=(null)
TimeLimit=UNLIMITED Requeue=1 Restarts=0 BatchFlag=0
ExitCode=0:0
SubmitTime=2010-01-05T10:58:40
EligibleTime=2010-01-05T10:58:40
StartTime=2010-01-05T10:58:40 EndTime=2010-01-05T10:58:40
SuspendTime=None SecsPreSuspend=0
Partition=debug AllocNode:Sid=snowflake:4702
ReqNodeList=(null) ExcNodeList=(null)
NodeList=snowflake0
NumNodes=1 NumCPUs=10 CPUs/Task=2 ReqS:C:T=1:1:1
MinCPUsNode=2 MinMemoryNode=0 MinTmpDiskNode=0
Features=(null) Reservation=(null)
Shared=OK Contiguous=0 Licenses=(null) Network=(null)
scontrol: update JobId=71701 TimeLimit=30:00 Priority=500
scontrol: show hostnames tux[1-3]
tux1
tux2
tux3
scontrol: create res StartTime=2009-04-01T08:00:00
Duration=5:00:00 Users=dbremer NodeCnt=10
Reservation created: dbremer_1
scontrol: update Reservation=dbremer_1 Flags=Maint NodeCnt=20
scontrol: delete Reservation=dbremer_1
scontrol: quit
COPYING
Copyright (C) 2002-2007 The Regents of the University of
California. Produced at Lawrence Livermore National Laboratory
(cf, DISCLAIMER). CODE-OCEC-09-009. All rights reserved.
This file is part of SLURM, a resource management program. For
details, see <https://computing.llnl.gov/linux/slurm/>.
SLURM is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published
by the Free Software Foundation; either version 2 of the
License, or (at your option) any later version.
SLURM is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
FILES
/etc/slurm.conf
SEE ALSO
scancel(1), sinfo(1), squeue(1), slurm_checkpoint(3),
slurm_create_partition(3), slurm_delete_partition(3),
slurm_load_ctl_conf(3), slurm_load_jobs(3), slurm_load_node(3),
slurm_load_partitions(3), slurm_reconfigure(3),
slurm_requeue(3), slurm_resume(3), slurm_shutdown(3),
slurm_suspend(3), slurm_takeover(3), slurm_update_job(3),
slurm_update_node(3), slurm_update_partition(3), slurm.conf(5),
slurmctld(8)