NAME
smap - graphically view information about SLURM jobs, partitions, and
set configurations parameters.
SYNOPSIS
smap [OPTIONS...]
DESCRIPTION
smap is used to graphically view job, partition and node information
for a system running SLURM. Note that information about nodes and
partitions to which a user lacks access will always be displayed to
avoid obvious gaps in the output. This is equivalent to the --all
option of the sinfo and squeue commands.
OPTIONS
-c, --commandline
Print output to the commandline, no curses.
-D <option>, --display=<option>
sets the display mode for smap. Showing revelant information
about specific views and displaying a corresponding node chart.
While in any display a user can switch by typing a different
view letter. This is true in all modes except for ’configure
mode’ user can type ’quit’ to exit just configure mode. Typing
’exit’ will end the configuration mode and exit smap. Note that
unallocated nodes are indicated by a ’.’ and nodes in the DOWN,
DRAINED or FAIL state by a ’#’.
b Displays information about BlueGene partitions on
the system
c Displays current BlueGene node states and allows
users to configure the system.
j Displays information about jobs running on
system.
r Display information about advanced reservations.
While all current and future reservations will be
listed, only currently active reservations will
appear on the node map.
s Displays information about slurm partitions on
the system
-h, --noheader
Do not print a header on the output.
--help,
Print a message describing all smap options.
-i <seconds> , --iterate=<seconds>
Print the state on a periodic basis. Sleep for the indicated
number of seconds between reports. User can exit at anytime by
typing ’q’ or hitting the return key. If user is in configure
mode type ’exit’ to exit program, ’quit’ to exit configure mode.
-I, --ionodes
Only show objects with these ionodes this support is only for
bluegene systems. This should be used inconjuction with the ’-n’
option. Only specify the ionode number range here. Specify the
node name with the ’-n’ option.
-n, --nodes
Only show objects with these nodes. If querying to the ionode
level use the option ’-I’ in conjunction with this option.
-Q, --quiet
Avoid printing error messages.
-R <RACK_MIDPLANE_ID/XYZ>, --resolve=<RACK_MIDPLANE_ID/XYZ>
Returns the XYZ coords for a Rack/Midplane id or vice-versa.
To get the XYZ coord for a Rack/Midplane id input -R R101 where
10 is the rack and 1 is the midplane.
To get the Rack/Midplane id from a XYZ coord input -R 101 where
X=1 Y=1 Z=1 with no leading ’R’.
--usage
Print a brief message listing the smap options.
-V , --version
Print version information and exit.
INTERACTIVE OPTIONS
When using smap in curses mode you can scroll through the different
windows using the arrow keys. The up and down arrow keys scroll the
window containing the grid, and the left and right arrow keys scroll
the window containing the text information.
OUTPUT FIELD DESCRIPTIONS
ACCESS_CONTROL
Identifies the users or bank accounts which can use this
advanced reservation. A prefix of "A:" indicates that the
following account names may use this reservation. A prefix of
"U:" indicates that the following user names may use this
reservation.
AVAIL Partition state: up or down.
BG_BLOCK
BlueGene Block Name.
CONN Connection Type: TORUS or MESH or SMALL (for small blocks).
END_TIME
The time when an advanced reservation ended.
ID Key to identify the nodes associated with this entity in the
node chart.
MODE Mode Type: COPROCESS or VIRTUAL.
NAME Name of the job or advanced reservation.
NODELIST or BP_LIST
Names of nodes or base partitions associated with this
configuration, partition or reservation.
NODES Count of nodes or base partitions with this particular
configuration.
PARTITION
Name of a partition. Note that the suffix "*" identifies the
default partition.
ST State of a job in compact form. Possible states include: PD
(pending), R (running), S (suspended), CD (completed), CF
(configuring), CG (completing), F (failed), TO (timeout), and NF
(node failure). See JOB STATE CODES section below for more
information.
START_TIME
The time when an advanced reservation started.
STATE State of the nodes. Possible states include: allocated,
completing, down, drained, draining, fail, failing, idle, and
unknown plus their abbreviated forms: alloc, comp, donw, drain,
drng, fail, failg, idle, and unk respectively. Note that the
suffix "*" identifies nodes that are presently not responding.
See NODE STATE CODES section below for more information.
TIMELIMIT
Maximum time limit for any user job in
days-hours:minutes:seconds. infinite is used to identify jobs
or partitions without a job time limit.
TOPOGRAPHY INFORMATION
The node chart is designed to indicate relative locations of the nodes.
On most Linux clusters this will represent a one-dimensional array of
nodes. Larger clusters will utilize multiple as needed with right side
of one line being logically followed by the left side of the next line.
On BlueGene systems, the node chart will indicate the three
dimensional topography of the system.
The X dimension will increase from left to right on a given line.
The Y dimension will increase in planes from bottom to top.
The Z dimension will increase within a plane from the back
line to the front line of a plane.
Note the example below:
a a a a b b d d
a a a a b b d d
a a a a b b c c
a a a a b b c c
a a a a b b d d
a a a a b b d d
a a a a b b c c
a a a a b b c c
a a a a . . d d
a a a a . . d d
a a a a . . e e Y
a a a a . . e e |
|
a a a a . . d d 0----X
a a a a . . d d /
a a a a . . . . /
a a a a . . . # Z
ID JOBID PARTITION BG_BLOCK USER NAME ST TIME NODES BP_LIST
a 12345 batch RMP0 joseph tst1 R 43:12 32k bgl[000x333]
b 12346 debug RMP1 chris sim3 R 12:34 8k bgl[420x533]
c 12350 debug RMP2 danny job3 R 0:12 4k bgl[622x733]
d 12356 debug RMP3 dan colu R 18:05 8k bgl[600x731]
e 12378 debug RMP4 joseph asx4 R 0:34 2k bgl[612x713]
CONFIGURATION INSTRUCTIONS
For Admin use. From this screen one can create a configuration file
that is used to partition and wire the system into usable blocks.
OUTPUT
BG_BLOCK
BlueGene Block Name.
CONN Connection Type: TORUS or MESH or SMALL (for small
blocks).
ID Key to identify the nodes associated with this entity in
the node chart.
MODE Mode Type: COPROCESS or VIRTUAL.
INPUT COMMANDS
resolve <RACK_MIDPLANE_ID/XYZ>
Returns the XYZ coords for a Rack/Midplane id or
vice-versa.
To get the XYZ coord for a Rack/Midplane id input -R R101
where 10 is the rack and 1 is the midplane.
To get the Rack/Midplane id from a XYZ coord input -R 101
where X=1 Y=1 Z=1 with no leading ’R’.
load <bluegene.conf file>
Load an already exsistant bluegene.conf file. This will
varify and mapout a bluegene.conf file. After loaded the
configuration may be edited and saved as a new file.
create <size> <options>
Submit request for partition creation. The size may be
specified either as a count of base partitions or
specific dimensions in the X, Y and Z directions
separated by "x", for example "2x3x4". A variety of
options may be specified. Valid options are listed below.
Note that the option and their values are case
insensitive (e.g. "MESH" and "mesh" are equivalent).
Start = XxYxZ
Identify where to start the partition. This is primarily
for testing purposes. For convenience one can only put
the X coord or XxY will also work. The default value is
0x0x0.
Connection = MESH | TORUS | SMALL
Identify how the nodes should be connected in network.
The default value is TORUS.
Small Equivalent to "Connection=Small". If a small
connection is specified the base partition chosen
will create smaller partitions based on options
32CNBlocks and 128CNBlocks respectively for a
Bluegene L system. 16CNBlocks, 64CNBlocks, and
256CNBlocks are also available for Bluegene P
systems. Keep in mind you must have enough
ionodes to make all these configurations
possible.
These number will be altered to take up the
entire base partition. Size does not need to be
specified with a small request, we will always
default to 1 base partition for allocation.
Mesh Equivalent to "Connection=Mesh".
Torus Equivalent to "Connection=Torus".
Rotation = TRUE | FALSE
Specifies that the geometry specified in the size
parameter may be rotated in space (e.g. the Y and Z
dimensions may be switched). The default value is FALSE.
Rotate Equivalent to "Rotation=true".
Elongation = TRUE | FALSE
If TRUE, permit the geometry specified in the size
parameter to be altered as needed to fit available
resources. For example, an allocation of "4x2x1" might
be used to satisfy a size specification of "2x2x2". The
default value is FALSE.
Elongate
Equivalent to "Elongation=true".
copy <id> <count>
Submit request for partition to be copied. You may copy
a specific partition by specifying its id, by default the
last configured partition is copied. You may also
specify a number of copies to be made. By default, one
copy is made.
delete <id>
Delete the specified block.
down <node_range>
Down a specific node or range of nodes. i.e. 000,
000-111 [000x111]
up <node_range>
Bring a specific node or range of nodes up. i.e. 000,
000-111 [000x111]
alldown
Set all nodes to down state.
allup Set all nodes to up state.
save <file_name>
Save the current configuration to a file. If no
file_name is specified, the configuration is written to a
file named "bluegene.conf" in the current working
directory.
clear Clear all partitions created.
NODE STATE CODES
Node state codes are shortened as required for the field size. If the
node state code is followed by "*", this indicates the node is
presently not responding and will not be allocated any new work. If
the node remains non-responsive, it will be placed in the DOWN state
(except in the case of COMPLETING, DRAINED, DRAINING, FAIL, FAILING
nodes).
If the node state code is followed by "~", this indicates the node is
presently in a power saving mode (typically running at reduced
frequency). If the node state code is followed by "#", this indicates
the node is presently being powered up or configured.
ALLOCATED The node has been allocated to one or more jobs.
ALLOCATED+ The node is allocated to one or more active jobs plus one
or more jobs are in the process of COMPLETING.
COMPLETING All jobs associated with this node are in the process of
COMPLETING. This node state will be removed when all of
the job’s processes have terminated and the SLURM epilog
program (if any) has terminated. See the Epilog parameter
description in the slurm.conf man page for more
information.
DOWN The node is unavailable for use. SLURM can automatically
place nodes in this state if some failure occurs. System
administrators may also explicitly place nodes in this
state. If a node resumes normal operation, SLURM can
automatically return it to service. See the ReturnToService
and SlurmdTimeout parameter descriptions in the
slurm.conf(5) man page for more information.
DRAINED The node is unavailable for use per system administrator
request. See the update node command in the scontrol(1)
man page or the slurm.conf(5) man page for more
information.
DRAINING The node is currently executing a job, but will not be
allocated to additional jobs. The node state will be
changed to state DRAINED when the last job on it completes.
Nodes enter this state per system administrator request.
See the update node command in the scontrol(1) man page or
the slurm.conf(5) man page for more information.
FAIL The node is expected to fail soon and is unavailable for
use per system administrator request. See the update node
command in the scontrol(1) man page or the slurm.conf(5)
man page for more information.
FAILING The node is currently executing a job, but is expected to
fail soon and is unavailable for use per system
administrator request. See the update node command in the
scontrol(1) man page or the slurm.conf(5) man page for more
information.
IDLE The node is not allocated to any jobs and is available for
use.
MAINT The node is currently in a reservation with a flag value of
"maintainence".
UNKNOWN The SLURM controller has just started and the node’s state
has not yet been determined.
JOB STATE CODES
Jobs typically pass through several states in the course of their
execution. The typical states are PENDING, RUNNING, SUSPENDED,
COMPLETING, and COMPLETED. An explanation of each state follows.
CA CANCELLED Job was explicitly cancelled by the user or system
administrator. The job may or may not have been
initiated.
CD COMPLETED Job has terminated all processes on all nodes.
CG COMPLETING Job is in the process of completing. Some processes
on some nodes may still be active.
CF CONFIGURING Job has been allocated resources, but are waiting
for them to become ready for use (e.g. booting).
F FAILED Job terminated with non-zero exit code or other
failure condition.
NF NODE_FAIL Job terminated due to failure of one or more
allocated nodes.
PD PENDING Job is awaiting resource allocation.
R RUNNING Job currently has an allocation.
S SUSPENDED Job has an allocation, but execution has been
suspended.
TO TIMEOUT Job terminated upon reaching its time limit.
ENVIRONMENT VARIABLES
The following environment variables can be used to override settings
compiled into smap.
SLURM_CONF The location of the SLURM configuration file.
COPYING
Copyright (C) 2004-2007 The Regents of the University of California.
Copyright (C) 2008-2009 Lawrence Livermore National Security. Produced
at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
CODE-OCEC-09-009. All rights reserved.
This file is part of SLURM, a resource management program. For
details, see <https://computing.llnl.gov/linux/slurm/>.
SLURM is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free
Software Foundation; either version 2 of the License, or (at your
option) any later version.
SLURM is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
for more details.
SEE ALSO
scontrol(1), sinfo(1), squeue(1), slurm_load_ctl_conf(3),
slurm_load_jobs(3), slurm_load_node(3), slurm_load_partitions(3),
slurm_reconfigure(3), slurm_shutdown(3), slurm_update_job(3),
slurm_update_node(3), slurm_update_partition(3), slurm.conf(5)