Man Linux: Main Page and Category List

NAME

       smap  -  graphically view information about SLURM jobs, partitions, and
       set configurations parameters.

SYNOPSIS

       smap [OPTIONS...]

DESCRIPTION

       smap is used to graphically view job, partition  and  node  information
       for  a  system  running  SLURM.   Note that information about nodes and
       partitions to which a user lacks access will  always  be  displayed  to
       avoid  obvious  gaps  in  the  output.  This is equivalent to the --all
       option of the sinfo and squeue commands.

OPTIONS

       -c, --commandline
              Print output to the commandline, no curses.

       -D <option>, --display=<option>
              sets the display mode for  smap.  Showing  revelant  information
              about  specific views and displaying a corresponding node chart.
              While in any display a user can switch  by  typing  a  different
              view  letter.   This  is true in all modes except for ’configure
              mode’ user can type ’quit’ to exit just configure mode.   Typing
              ’exit’ will end the configuration mode and exit smap.  Note that
              unallocated nodes are indicated by a ’.’ and nodes in the  DOWN,
              DRAINED or FAIL state by a ’#’.

              b              Displays information about BlueGene partitions on
                             the system

              c              Displays current BlueGene node states and  allows
                             users to configure the system.

              j              Displays   information   about  jobs  running  on
                             system.

              r              Display information about advanced  reservations.
                             While all current and future reservations will be
                             listed, only currently active  reservations  will
                             appear on the node map.

              s              Displays  information  about  slurm partitions on
                             the system

       -h, --noheader
              Do not print a header on the output.

       --help,
              Print a message describing all smap options.

       -i <seconds> , --iterate=<seconds>
              Print the state on a periodic basis.  Sleep  for  the  indicated
              number  of seconds between reports.  User can exit at anytime by
              typing ’q’ or hitting the return key.  If user is  in  configure
              mode type ’exit’ to exit program, ’quit’ to exit configure mode.

       -I, --ionodes
              Only show objects with these ionodes this support  is  only  for
              bluegene systems. This should be used inconjuction with the ’-n’
              option.  Only specify the ionode number range here.  Specify the
              node name with the ’-n’ option.

       -n, --nodes
              Only  show  objects with these nodes.  If querying to the ionode
              level use the option ’-I’ in conjunction with this option.

       -Q, --quiet
              Avoid printing error messages.

       -R <RACK_MIDPLANE_ID/XYZ>, --resolve=<RACK_MIDPLANE_ID/XYZ>
              Returns the XYZ coords for a Rack/Midplane id or vice-versa.

              To get the XYZ coord for a Rack/Midplane id input -R R101  where
              10 is the rack and 1 is the midplane.

              To  get the Rack/Midplane id from a XYZ coord input -R 101 where
              X=1 Y=1 Z=1 with no leading ’R’.

       --usage
              Print a brief message listing the smap options.

       -V , --version
              Print version information and exit.

INTERACTIVE OPTIONS

       When using smap in curses mode you can  scroll  through  the  different
       windows  using  the  arrow keys.  The up and down arrow keys scroll the
       window containing the grid, and the left and right  arrow  keys  scroll
       the window containing the text information.

OUTPUT FIELD DESCRIPTIONS

       ACCESS_CONTROL
              Identifies  the  users  or  bank  accounts  which  can  use this
              advanced reservation.  A  prefix  of  "A:"  indicates  that  the
              following  account  names may use this reservation.  A prefix of
              "U:" indicates that  the  following  user  names  may  use  this
              reservation.

       AVAIL  Partition state: up or down.

       BG_BLOCK
              BlueGene Block Name.

       CONN   Connection Type: TORUS or MESH or SMALL (for small blocks).

       END_TIME
              The time when an advanced reservation ended.

       ID     Key  to  identify  the  nodes associated with this entity in the
              node chart.

       MODE   Mode Type: COPROCESS or VIRTUAL.

       NAME   Name of the job or advanced reservation.

       NODELIST or BP_LIST
              Names  of  nodes  or  base  partitions  associated   with   this
              configuration, partition or reservation.

       NODES  Count   of   nodes  or  base  partitions  with  this  particular
              configuration.

       PARTITION
              Name of a partition.  Note that the suffix  "*"  identifies  the
              default partition.

       ST     State  of  a  job  in  compact form. Possible states include: PD
              (pending), R  (running),  S  (suspended),  CD   (completed),  CF
              (configuring), CG (completing), F (failed), TO (timeout), and NF
              (node failure). See JOB  STATE  CODES  section  below  for  more
              information.

       START_TIME
              The time when an advanced reservation started.

       STATE  State   of  the  nodes.   Possible  states  include:  allocated,
              completing, down, drained, draining, fail,  failing,  idle,  and
              unknown  plus their abbreviated forms: alloc, comp, donw, drain,
              drng, fail, failg, idle, and unk respectively.   Note  that  the
              suffix  "*"  identifies nodes that are presently not responding.
              See NODE STATE CODES section below for more information.

       TIMELIMIT
              Maximum    time    limit     for     any     user     job     in
              days-hours:minutes:seconds.   infinite  is used to identify jobs
              or partitions without a job time limit.

       TOPOGRAPHY INFORMATION

       The node chart is designed to indicate relative locations of the nodes.
       On  most  Linux clusters this will represent a one-dimensional array of
       nodes. Larger clusters will utilize multiple as needed with right  side
       of one line being logically followed by the left side of the next line.

       On BlueGene systems, the node chart will indicate the three
       dimensional topography of the system.
       The X dimension will increase from left to right on a given line.
       The Y dimension will increase in planes from bottom to top.
       The Z dimension will increase within a plane from the back
       line to the front line of a plane.
       Note the example below:

          a a a a b b d d
         a a a a b b d d
        a a a a b b c c
       a a a a b b c c

          a a a a b b d d
         a a a a b b d d
        a a a a b b c c
       a a a a b b c c

          a a a a . . d d
         a a a a . . d d
        a a a a . . e e              Y
       a a a a . . e e               |
                                     |
          a a a a . . d d            0----X
         a a a a . . d d            /
        a a a a . . . .            /
       a a a a . . . #            Z

       ID JOBID PARTITION BG_BLOCK USER   NAME ST  TIME NODES BP_LIST
       a  12345 batch     RMP0     joseph tst1 R  43:12   32k bgl[000x333]
       b  12346 debug     RMP1     chris  sim3 R  12:34    8k bgl[420x533]
       c  12350 debug     RMP2     danny  job3 R   0:12    4k bgl[622x733]
       d  12356 debug     RMP3     dan    colu R  18:05    8k bgl[600x731]
       e  12378 debug     RMP4     joseph asx4 R   0:34    2k bgl[612x713]

CONFIGURATION INSTRUCTIONS

       For Admin use. From this screen one can  create  a  configuration  file
       that is used to partition and wire the system into usable blocks.

       OUTPUT

              BG_BLOCK
                     BlueGene Block Name.

              CONN   Connection  Type:  TORUS  or  MESH  or  SMALL  (for small
                     blocks).

              ID     Key to identify the nodes associated with this entity  in
                     the node chart.

              MODE   Mode Type: COPROCESS or VIRTUAL.

       INPUT COMMANDS

              resolve <RACK_MIDPLANE_ID/XYZ>
                     Returns   the  XYZ  coords  for  a  Rack/Midplane  id  or
                     vice-versa.

                     To get the XYZ coord for a Rack/Midplane id input -R R101
                     where 10 is the rack and 1 is the midplane.

                     To get the Rack/Midplane id from a XYZ coord input -R 101
                     where X=1 Y=1 Z=1 with no leading ’R’.

              load <bluegene.conf file>
                     Load an already exsistant bluegene.conf file.  This  will
                     varify and mapout a bluegene.conf file.  After loaded the
                     configuration may be edited and saved as a new file.

              create <size> <options>
                     Submit request for partition creation. The  size  may  be
                     specified  either  as  a  count  of  base  partitions  or
                     specific  dimensions  in  the  X,  Y  and  Z   directions
                     separated  by  "x",  for  example  "2x3x4".  A variety of
                     options may be specified. Valid options are listed below.
                     Note   that   the   option  and  their  values  are  case
                     insensitive (e.g. "MESH" and "mesh" are equivalent).

              Start = XxYxZ
                     Identify where to start the partition.  This is primarily
                     for  testing  purposes.  For convenience one can only put
                     the X coord or XxY will also work.  The default value  is
                     0x0x0.

              Connection = MESH | TORUS | SMALL
                     Identify  how  the  nodes should be connected in network.
                     The default value is TORUS.

                      Small  Equivalent to  "Connection=Small".   If  a  small
                             connection is specified the base partition chosen
                             will create smaller partitions based  on  options
                             32CNBlocks  and  128CNBlocks  respectively  for a
                             Bluegene L system.  16CNBlocks,  64CNBlocks,  and
                             256CNBlocks  are  also  available  for Bluegene P
                             systems.  Keep  in  mind  you  must  have  enough
                             ionodes   to   make   all   these  configurations
                             possible.
                               These number will be altered  to  take  up  the
                             entire  base  partition. Size does not need to be
                             specified with a small request,  we  will  always
                             default to 1 base partition for allocation.

                      Mesh   Equivalent to "Connection=Mesh".

                      Torus  Equivalent to "Connection=Torus".

              Rotation = TRUE | FALSE
                     Specifies   that  the  geometry  specified  in  the  size
                     parameter may be rotated in  space  (e.g.  the  Y  and  Z
                     dimensions may be switched).  The default value is FALSE.

              Rotate Equivalent to "Rotation=true".

              Elongation = TRUE | FALSE
                     If TRUE,  permit  the  geometry  specified  in  the  size
                     parameter  to  be  altered  as  needed  to  fit available
                     resources.  For example, an allocation of  "4x2x1"  might
                     be  used to satisfy a size specification of "2x2x2".  The
                     default value is FALSE.

              Elongate
                     Equivalent to "Elongation=true".

              copy <id> <count>
                     Submit request for partition to be copied.  You may  copy
                     a specific partition by specifying its id, by default the
                     last  configured  partition  is  copied.   You  may  also
                     specify  a  number of copies to be made.  By default, one
                     copy is made.

              delete <id>
                     Delete the specified block.

              down <node_range>
                     Down a specific  node  or  range  of  nodes.   i.e.  000,
                     000-111 [000x111]

              up <node_range>
                     Bring  a  specific  node or range of nodes up.  i.e. 000,
                     000-111 [000x111]

              alldown
                     Set all nodes to down state.

              allup  Set all nodes to up state.

              save <file_name>
                     Save  the  current  configuration  to  a  file.   If   no
                     file_name is specified, the configuration is written to a
                     file  named  "bluegene.conf"  in  the   current   working
                     directory.

              clear  Clear all partitions created.

NODE STATE CODES

       Node  state codes are shortened as required for the field size.  If the
       node state code  is  followed  by  "*",  this  indicates  the  node  is
       presently  not  responding  and will not be allocated any new work.  If
       the node remains non-responsive, it will be placed in  the  DOWN  state
       (except  in  the  case  of COMPLETING, DRAINED, DRAINING, FAIL, FAILING
       nodes).

       If the node state code is followed by "~", this indicates the  node  is
       presently  in  a  power  saving  mode  (typically  running  at  reduced
       frequency).  If the node state code is followed by "#", this  indicates
       the node is presently being powered up or configured.

       ALLOCATED   The node has been allocated to one or more jobs.

       ALLOCATED+  The  node  is allocated to one or more active jobs plus one
                   or more jobs are in the process of COMPLETING.

       COMPLETING  All jobs associated with this node are in  the  process  of
                   COMPLETING.   This  node  state will be removed when all of
                   the job’s processes have terminated and  the  SLURM  epilog
                   program  (if  any) has terminated. See the Epilog parameter
                   description  in  the   slurm.conf   man   page   for   more
                   information.

       DOWN        The  node  is  unavailable for use. SLURM can automatically
                   place nodes in this state if some  failure  occurs.  System
                   administrators  may  also  explicitly  place  nodes in this
                   state. If  a  node  resumes  normal  operation,  SLURM  can
                   automatically return it to service. See the ReturnToService
                   and   SlurmdTimeout   parameter   descriptions    in    the
                   slurm.conf(5) man page for more information.

       DRAINED     The  node  is  unavailable for use per system administrator
                   request.  See the update node command  in  the  scontrol(1)
                   man   page   or   the   slurm.conf(5)  man  page  for  more
                   information.

       DRAINING    The node is currently executing a  job,  but  will  not  be
                   allocated  to  additional  jobs.  The  node  state  will be
                   changed to state DRAINED when the last job on it completes.
                   Nodes  enter  this  state per system administrator request.
                   See the update node command in the scontrol(1) man page  or
                   the slurm.conf(5) man page for more information.

       FAIL        The  node  is  expected to fail soon and is unavailable for
                   use per system administrator request.  See the update  node
                   command  in  the  scontrol(1) man page or the slurm.conf(5)
                   man page for more information.

       FAILING     The node is currently executing a job, but is  expected  to
                   fail   soon   and   is   unavailable  for  use  per  system
                   administrator request.  See the update node command in  the
                   scontrol(1) man page or the slurm.conf(5) man page for more
                   information.

       IDLE        The node is not allocated to any jobs and is available  for
                   use.

       MAINT       The node is currently in a reservation with a flag value of
                   "maintainence".

       UNKNOWN     The SLURM controller has just started and the node’s  state
                   has not yet been determined.

JOB STATE CODES

       Jobs  typically  pass  through  several  states  in the course of their
       execution.   The  typical  states  are  PENDING,  RUNNING,   SUSPENDED,
       COMPLETING, and COMPLETED.  An explanation of each state follows.

       CA  CANCELLED       Job  was explicitly cancelled by the user or system
                           administrator.  The job may or may  not  have  been
                           initiated.

       CD  COMPLETED       Job has terminated all processes on all nodes.

       CG  COMPLETING      Job is in the process of completing. Some processes
                           on some nodes may still be active.

       CF  CONFIGURING     Job has been allocated resources, but  are  waiting
                           for them to become ready for use (e.g. booting).

       F   FAILED          Job  terminated  with  non-zero  exit code or other
                           failure condition.

       NF  NODE_FAIL       Job terminated  due  to  failure  of  one  or  more
                           allocated nodes.

       PD  PENDING         Job is awaiting resource allocation.

       R   RUNNING         Job currently has an allocation.

       S   SUSPENDED       Job  has  an  allocation,  but  execution  has been
                           suspended.

       TO  TIMEOUT         Job terminated upon reaching its time limit.

ENVIRONMENT VARIABLES

       The following environment variables can be used  to  override  settings
       compiled into smap.

       SLURM_CONF          The location of the SLURM configuration file.

COPYING

       Copyright  (C)  2004-2007  The Regents of the University of California.
       Copyright (C) 2008-2009 Lawrence Livermore National Security.  Produced
       at   Lawrence   Livermore   National   Laboratory   (cf,   DISCLAIMER).
       CODE-OCEC-09-009. All rights reserved.

       This file is  part  of  SLURM,  a  resource  management  program.   For
       details, see <https://computing.llnl.gov/linux/slurm/>.

       SLURM  is free software; you can redistribute it and/or modify it under
       the terms of the GNU General Public License as published  by  the  Free
       Software  Foundation;  either  version  2  of  the License, or (at your
       option) any later version.

       SLURM is distributed in the hope that it will be  useful,  but  WITHOUT
       ANY  WARRANTY;  without even the implied warranty of MERCHANTABILITY or
       FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General  Public  License
       for more details.

SEE ALSO

       scontrol(1),      sinfo(1),      squeue(1),     slurm_load_ctl_conf(3),
       slurm_load_jobs(3),    slurm_load_node(3),    slurm_load_partitions(3),
       slurm_reconfigure(3),      slurm_shutdown(3),      slurm_update_job(3),
       slurm_update_node(3), slurm_update_partition(3), slurm.conf(5)