Man Linux: Main Page and Category List

NAME

       strigger - Used set, get or clear Slurm trigger information.

SYNOPSIS

       strigger --set   [OPTIONS...]
       strigger --get   [OPTIONS...]
       strigger --clear [OPTIONS...]

DESCRIPTION

       strigger  is  used  to  set,  get  or  clear Slurm trigger information.
       Triggers include events such as a node failing, a job reaching its time
       limit or a job terminating.  These events can cause actions such as the
       execution of an  arbitrary  script.   Typical  uses  include  notifying
       system administrators of node failures and gracefully terminating a job
       when it’s time limit is approaching.  A  hostlist  expression  for  the
       nodelist or job ID is passed as an argument to the program.

       Trigger  events  are  not processed instantly, but a check is performed
       for trigger events on a periodic basis (currently  every  15  seconds).
       Any  trigger  events  which occur within that interval will be compared
       against the trigger programs set at the end of the time interval.   The
       trigger  program  will  be executed once for any event occuring in that
       interval.  The record of those events (e.g. nodes which  went  DOWN  in
       the  previous  15  seconds)  will then be cleared.  The trigger program
       must set a new trigger before the end of the next  interval  to  insure
       that  no  trigger  events  are  missed.   If  desired, multiple trigger
       programs can be set for the same event.

       IMPORTANT NOTE: This command can only set triggers if run by  the  user
       SlurmUser  unless  SlurmUser  is  configured  as  user  root.   This is
       required for the slurmctld daemon to set the appropriate user and group
       IDs  for  the executed program.  Also note that the program is executed
       on the same node that  the  slurmctld  daemon  uses  rather  than  some
       allocated  compute  node.   To  check  the  value of SlurmUser, run the
       command:

       scontrol show config | grep SlurmUser

ARGUMENTS

       --block_err
              Trigger an event when a BlueGene block enters an ERROR state.

       --clear
              Clear or delete a previously defined event trigger.   The  --id,
              --jobid  or  --userid  option  must be specified to identify the
              trigger(s) to be cleared.

       -d, --down
              Trigger an event if the specified node goes into a DOWN state.

       -D, --drained
              Trigger an event if the  specified  node  goes  into  a  DRAINED
              state.

       -F, --fail
              Trigger  an  event  if  the  specified  node goes into a FAILING
              state.

       -f, --fini
              Trigger an event when the specified job completes execution.

       --get  Show  registered  event  triggers.   Options  can  be  used  for
              filtering purposes.

       -i, --id=id
              Trigger ID number.

       -I, --idle
              Trigger  an event if the specified node remains in an IDLE state
              for at least the time period specified by the  --offset  option.
              This  can  be useful to hibernate a node that remains idle, thus
              reducing power consumption.

       -j, --jobid=id
              Job ID of interest.  NOTE: The --jobid option can not be used in
              conjunction  with  the --node option. When the --jobid option is
              used in conjunction with the --up or --down  option,  all  nodes
              allocated  to  that  job  will  considered  the  nodes used as a
              trigger event.

       -n, --node[=host]
              Host name(s) of interest.  By default, all nodes associated with
              the  job  (if  --jobid  is  specified)  or  on  the  system  are
              considered for event triggers.  NOTE: The --node option can  not
              be used in conjunction with the --jobid option. When the --jobid
              option is used in conjunction with the --up, --down or --drained
              option,  all  nodes  allocated  to  that job will considered the
              nodes used as a trigger event.

       -o, --offset=seconds
              The specified action  should  follow  the  event  by  this  time
              interval.   Specify  a  negative value if action should preceded
              the event.  The default value is zero if no --offset  option  is
              specified.   The resolution of this time is about 20 seconds, so
              to execute a script not less than five minutes prior  to  a  job
              reaching its time limit, specify --offset=320 (5 minutes plus 20
              seconds).

       -p, --program=path
              Execute the program at the specified  fully  qualified  pathname
              when the event occurs.  The program will be executed as the user
              who sets the trigger.  If the program fails to terminate  within
              5 minutes, it will be killed along with any spawned processes.

       -Q, --quiet
              Do  not  report  non-fatal  errors.  This can be useful to clear
              triggers which may have already been purged.

       -r, --reconfig
              Trigger an event when the system configuration changes.

       --set  Register an event  trigger  based  upon  the  supplied  options.
              NOTE:  An event is only triggered once. A new event trigger must
              be set established for future events of  the  same  type  to  be
              processed.

       -t, --time
              Trigger an event when the specified job’s time limit is reached.
              This must be used in conjunction with the --jobid option.

       -u, --up
              Trigger an event if the specified node is  returned  to  service
              from a DOWN state.

       --user=user_name_or_id
              Clear  or  get  triggers  associated  with  the  specified user.
              Specify either a user name or user ID.

       -v, --verbose
              Print detailed event logging. This includes time-stamps on  data
              structures, record counts, etc.

       -V , --version
              Print version information and exit.

OUTPUT FIELD DESCRIPTIONS

       TRIG_ID
              Trigger ID number.

       RES_TYPE
              Resource type: job or node

       RES_ID Resource ID: job ID or host names or "*" for any host

       TYPE   Trigger type: time or fini (for jobs only), down or up (for jobs
              or nodes), or drained, idle or reconfig (for nodes only)

       OFFSET Time offset in seconds. Negative numbers  indicated  the  action
              should occur before the event (if possible)

       USER   Name of the user requesting the action

       PROGRAM
              Pathname of the program to execute when the event occurs

EXAMPLES

       Execute the program "/usr/sbin/slurm_admin_notify" whenever any node in
       the cluster goes down. The subject line will  include  the  node  names
       which  have entered the down state (passed as an argument to the script
       by SLURM).

            > cat /usr/sbin/slurm_admin_notify
            #!/bin/bash
            # Submit trigger for next event
            strigger --set --node --down \
                     --program=/usr/sbin/slurm_admin_notify
            # Notify administrator using by e-mail
            /bin/mail slurm_admin@site.com -s NodesDown:$*

            > strigger --set --node --down \
                       --program=/usr/sbin/slurm_admin_notify

       Execute the program "/usr/sbin/slurm_suspend_node" whenever any node in
       the cluster remains in the idle state for at least 600 seconds.

            > strigger --set --node --idle --offset=600 \
                       --program=/usr/sbin/slurm_suspend_node

       Execute  the  program  "/home/joe/clean_up"  when job 1234 is within 10
       minutes of reaching its time limit.

            > strigger --set --jobid=1234 --time --offset=-600 \
                       --program=/home/joe/clean_up

       Execute the program "/home/joe/node_died" when any  node  allocated  to
       job 1234 enters the DOWN state.

            > strigger --set --jobid=1234 --down \
                       --program=/home/joe/node_died

       Show all triggers associated with job 1235.

            > strigger --get --jobid=1235
            TRIG_ID RES_TYPE RES_ID TYPE OFFSET USER PROGRAM
                123      job   1235 time   -600  joe /home/bob/clean_up
                125      job   1235 down      0  joe /home/bob/node_died

       Delete event trigger 125.

            > strigger --clear --id=125

       Execute /home/joe/job_fini upon completion of job 1237.

            > strigger --set --jobid=1237 --fini --program=/home/joe/job_fini

COPYING

       Copyright  (C)  2007  The  Regents  of  the  University  of California.
       Produced at Lawrence Livermore National  Laboratory  (cf,  DISCLAIMER).
       CODE-OCEC-09-009. All rights reserved.

       This  file  is  part  of  SLURM,  a  resource  management program.  For
       details, see <https://computing.llnl.gov/linux/slurm/>.

       SLURM is free software; you can redistribute it and/or modify it  under
       the  terms  of  the GNU General Public License as published by the Free
       Software Foundation; either version 2  of  the  License,  or  (at  your
       option) any later version.

       SLURM  is  distributed  in the hope that it will be useful, but WITHOUT
       ANY WARRANTY; without even the implied warranty of  MERCHANTABILITY  or
       FITNESS  FOR  A PARTICULAR PURPOSE.  See the GNU General Public License
       for more details.

SEE ALSO

       scontrol(1), sinfo(1), squeue(1)