NAME
SPANK - SLURM Plug-in Architecture for Node and job (K)control
DESCRIPTION
This manual briefly describes the capabilities of the SLURM Plug-in
architecture for Node and job Kontrol (SPANK) as well as the SPANK
configuration file: (By default: plugstack.conf.)
SPANK provides a very generic interface for stackable plug-ins which
may be used to dynamically modify the job launch code in SLURM. SPANK
plugins may be built without access to SLURM source code. They need
only be compiled against SLURM’s spank.h header file, added to the
SPANK config file plugstack.conf, and they will be loaded at runtime
during the next job launch. Thus, the SPANK infrastructure provides
administrators and other developers a low cost, low effort ability to
dynamically modify the runtime behavior of SLURM job launch.
SPANK PLUGINS
SPANK plugins are loaded in up to three separate contexts during a
SLURM job. Briefly, the three contexts are:
local In local context, the plugin is loaded by srun. (i.e. the
"local" part of a parallel job).
remote In remote context, the plugin is loaded by slurmd. (i.e. the
"remote" part of a parallel job).
allocator
In allocator context, the plugin is loaded in one of the job
allocation utilities sbatch or salloc.
In local context, only the init, exit, init_post_opt, and
user_local_init functions are called. In allocator context, only the
init, exit, and init_post_opt functions are called. Plugins may query
the context in which they are running with the spank_context and
spank_remote functions defined in <slurm/spank.h>.
SPANK plugins may be called from multiple points during the SLURM job
launch. A plugin may define the following functions:
slurm_spank_init
Called just after plugins are loaded. In remote context, this is just
after job step is initialized. This function is called before any
plugin option processing.
slurm_spank_init_post_opt
Called at the same point as slurm_spank_init, but after all user
options to the plugin have been processed. The reason that the init
and init_post_opt callbacks are separated is so that plugins can
process system-wide options specified in plugstack.conf in the init
callback, then process user options, and finally take some action in
slurm_spank_init_post_opt if necessary.
slurm_spank_local_user_init
Called in local (srun) context only after all options have been
processed. This is called after the job ID and step IDs are
available. This happens in srun after the allocation is made, but
before tasks are launched.
slurm_spank_user_init
Called after privileges are temporarily dropped. (remote context
only)
slurm_spank_task_init_privileged
Called for each task just after fork, but before all elevated
privileges are dropped. (remote context only)
slurm_spank_task_init
Called for each task just before execve(2). (remote context only)
slurm_spank_task_post_fork
Called for each task from parent process after fork(2) is complete.
Due to the fact that slurmd does not exec any tasks until all tasks
have completed fork(2), this call is guaranteed to run before the
user task is executed. (remote context only)
slurm_spank_task_exit
Called for each task as its exit status is collected by SLURM.
(remote context only)
slurm_spank_exit
Called once just before slurmstepd exits in remote context. In local
context, called before srun exits.
All of these functions have the same prototype, for example:
int slurm_spank_init (spank_t spank, int ac, char *argv[])
Where spank is the SPANK handle which must be passed back to SLURM when
the plugin calls functions like spank_get_item and spank_getenv.
Configured arguments (See CONFIGURATION below) are passed in the
argument vector argv with argument count ac.
SPANK plugins can query the current list of supported slurm_spank
symbols to determine if the current version supports a given plugin
hook. This may be useful because the list of plugin symbols may grow
in the future. The query is done using the spank_symbol_supported
function, which has the following prototype:
int spank_symbol_supported (const char *sym);
The return value is 1 if the symbol is supported, 0 if not.
SPANK plugins do not have direct access to internally defined SLURM
data structures. Instead, information about the currently executing job
is obtained via the spank_get_item function call.
spank_err_t spank_get_item (spank_t spank, spank_item_t item, ...);
The spank_get_item call must be passed the current SPANK handle as well
as the item requested, which is defined by the passed spank_item_t. A
variable number of pointer arguments are also passed, depending on
which item was requested by the plugin. A list of the valid values for
item is kept in the spank.h header file. Some examples are:
S_JOB_UID
User id for running job. (uid_t *) is third arg of spank_get_item
S_JOB_STEPID
Job step id for running job. (uint32_t *) is third arg of
spank_get_item.
S_TASK_EXIT_STATUS
Exit status for exited task. Only valid from slurm_spank_task_exit.
(int *) is third arg of spank_get_item.
S_JOB_ARGV
Complete job command line. Third and fourth args to spank_get_item
are (int *, char ***).
See spank.h for more details, and EXAMPLES below for an example of
spank_get_item usage.
SPANK plugins may also use the spank_getenv, spank_setenv, and
spank_unsetenv functions to view and modify the job’s environment.
spank_getenv searches the job’s environment for the environment
variable var and copies the current value into a buffer buf of length
len. spank_setenv allows a SPANK plugin to set or overwrite a variable
in the job’s environment, and spank_unsetenv unsets an environment
variable in the job’s environment. The prototypes are:
spank_err_t spank_getenv (spank_t spank, const char *var,
char *buf, int len);
spank_err_t spank_setenv (spank_t spank, const char *var,
const char *val, int overwrite);
spank_err_t spank_unsetenv (spank_t spank, const char *var);
These are only necessary in remote context since modifications of the
standard process environment using setenv(3), getenv(3), and
unsetenv(3) may be used in local context.
Functions are also available from within the SPANK plugins to establish
environment variables to be exported to the SLURM PrologSlurmctld,
Prolog, Epilog and EpilogSlurmctld programs (the so-called job control
environment). The name of environment variables established by these
calls will be prepended with the string SPANK_ in order to avoid any
security implications of arbitrary environment variable control. (After
all, the job control scripts do run as root or the SLURM user.).
These functions are available from local context only.
spank_err_t spank_job_control_getenv(spank_t spank, const char *var,
char *buf, int len);
spank_err_t spank_job_control_setenv(spank_t spank, const char *var,
const char *val, int overwrite);
spank_err_t spank_job_control_unsetenv(spank_t spank, const char *var);
See spank.h for more information, and EXAMPLES below for an example for
spank_getenv usage.
Many of the described SPANK functions available to plugins return
errors via the spank_err_t error type. On success, the return value
will be set to ESPANK_SUCCESS, while on failure, the return value will
be set to one of many error values defined in slurm/spank.h. The SPANK
interface provides a simple function
const char * spank_strerror(spank_err_t err);
which may be used to translate a spank_err_t value into its string
representation.
SPANK OPTIONS
SPANK plugins also have an interface through which they may define and
implement extra job options. These options are made available to the
user through SLURM commands such as srun(1), salloc(1), and sbatch(1).
if the option is specified by the user, its value is forwarded and
registered with the plugin in slurmd when the job is run. In this way,
SPANK plugins may dynamically provide new options and functionality to
SLURM.
Each option registered by a plugin to SLURM takes the form of a struct
spank_option which is declared in <slurm/spank.h> as
struct spank_option {
char * name;
char * arginfo;
char * usage;
int has_arg;
int val;
spank_opt_cb_f cb;
};
Where
name is the name of the option. Its length is limited to
SPANK_OPTION_MAXLEN defined in <slurm/spank.h>.
arginfo
is a description of the argument to the option, if the option
does take an argument.
usage is a short description of the option suitable for --help output.
has_arg
0 if option takes no argument, 1 if option takes an argument,
and 2 if the option takes an optional argument. (See
getopt_long(3)).
val A plugin-local value to return to the option callback function.
cb A callback function that is invoked when the plugin option is
registered with SLURM. spank_opt_cb_f is typedef’d in
<slurm/spank.h> as
typedef int (*spank_opt_cb_f) (int val, const char *optarg,
int remote);
Where val is the value of the val field in the spank_option
struct, optarg is the supplied argument if applicable, and
remote is 0 if the function is being called from the "local"
host (e.g. srun) or 1 from the "remote" host (slurmd).
Plugin options may be registered with SLURM using the
spank_option_register function. This function is only valid when called
from the plugin’s slurm_spank_init handler, and registers one option at
a time. The prototype is
spank_err_t spank_option_register (spank_t sp,
struct spank_option *opt);
This function will return ESPANK_SUCCESS on successful registration of
an option, or ESPANK_BAD_ARG for errors including invalid spank_t
handle, or when the function is not called from the slurm_spank_init
function. All options need to be registered from all contexts in which
they will be used. For instance, if an option is only used in local
(srun) and remote (slurmd) contexts, then spank_option_register should
only be called from within those contexts. For example:
if (spank_context() != S_CTX_ALLOCATOR)
spank_option_register (sp, opt);
If, however, the option is used in all contexts, the
spank_option_register needs to be called everywhere.
In addition to spank_option_register, plugins may also export options
to SLURM by defining a table of struct spank_option with the symbol
name spank_options. This method, however, is not supported for use with
sbatch and salloc (allocator context), thus the use of
spank_option_register is preferred. When using the spank_options table,
the final element in the array must be filled with zeros. A
SPANK_OPTIONS_TABLE_END macro is provided in <slurm/spank.h> for this
purpose.
When an option is provided by the user on the local side, SLURM will
immediately invoke the option’s callback with remote=0. This is meant
for the plugin to do local sanity checking of the option before the
value is sent to the remote side during job launch. If the argument the
user specified is invalid, the plugin should issue an error and issue a
non-zero return code from the callback.
On the remote side, options and their arguments are registered just
after SPANK plugins are loaded and before the spank_init handler is
called. This allows plugins to modify behavior of all plugin
functionality based on the value of user-provided options. (See
EXAMPLES below for a plugin that registers an option with SLURM).
CONFIGURATION
The default SPANK plug-in stack configuration file is plugstack.conf in
the same directory as slurm.conf(5), though this may be changed via the
SLURM config parameter PlugStackConfig. Normally the plugstack.conf
file should be identical on all nodes of the cluster. The config file
lists SPANK plugins, one per line, along with whether the plugin is
required or optional, and any global arguments that are to be passed to
the plugin for runtime configuration. Comments are preceded with ’#’
and extend to the end of the line. If the configuration file is
missing or empty, it will simply be ignored.
The format of each non-comment line in the configuration file is:
required/optional plugin arguments
For example:
optional /usr/lib/slurm/test.so
Tells slurmd to load the plugin test.so passing no arguments. If a
SPANK plugin is required, then failure of any of the plugin’s functions
will cause slurmd to terminate the job, while optional plugins only
cause a warning.
If a fully-qualified path is not specified for a plugin, then the
currently configure PluginDir in slurm.conf(5) is searched.
SPANK plugins are stackable, meaning that more than one plugin may be
placed into the config file. The plugins will simply be called in
order, one after the other, and appropriate action taken on failure
given that state of the plugin’s optional flag.
Additional config files or directories of config files may be included
in plugstack.conf with the include keyword. The include keyword must
appear on its own line, and takes a glob as its parameter, so multiple
files may be included from one include line. For example, the following
syntax will load all config files in the /etc/slurm/plugstack.conf.d
directory, in local collation order:
include /etc/slurm/plugstack.conf.d/*
which might be considered a more flexible method for building up a
spank plugin stack.
The SPANK config file is re-read on each job launch, so editing the
config file will not affect running jobs. However care should be taken
so that a partially edited config file is not read by a launching job.
EXAMPLES
Simple SPANK config file:
#
# SPANK config file
#
# required? plugin args
#
optional renice.so min_prio=-10
required /usr/lib/slurm/test.so
The following is a simple SPANK plugin to modify the nice value of job
tasks. This plugin adds a --renice=[prio] option to srun which users
can use to set the priority of all remote tasks. Priority may also be
specified via a SLURM_RENICE environment variable. A minimum priority
may be established via a "min_prio" parameter in plugstack.conf (See
above for example).
/*
* To compile:
* gcc -shared -o renice.so renice.c
*
*/
#include <sys/types.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/resource.h>
#include <slurm/spank.h>
/*
* All spank plugins must define this macro for the SLURM plugin loader.
*/
SPANK_PLUGIN(renice, 1);
#define PRIO_ENV_VAR "SLURM_RENICE"
#define PRIO_NOT_SET 42
/*
* Minimum allowable value for priority. May be set globally
* via plugin option min_prio=<prio>
*/
static int min_prio = -20;
static int prio = PRIO_NOT_SET;
static int _renice_opt_process (int val, const char *optarg, int remote);
static int _str2prio (const char *str, int *p2int);
/*
* Provide a --renice=[prio] option to srun:
*/
struct spank_option spank_options[] =
{
{ "renice", "[prio]", "Re-nice job tasks to priority [prio].", 2, 0,
(spank_opt_cb_f) _renice_opt_process
},
SPANK_OPTIONS_TABLE_END
};
/*
* Called from both srun and slurmd.
*/
int slurm_spank_init (spank_t sp, int ac, char **av)
{
int i;
/* Don’t do anything in sbatch/salloc
*/
if (spank_context () == S_CTX_ALLOCATOR)
return (0);
for (i = 0; i < ac; i++) {
if (strncmp ("min_prio=", av[i], 9) == 0) {
const char *optarg = av[i] + 9;
if (_str2prio (optarg, &min_prio) < 0)
slurm_error ("Ignoring invalid min_prio value: %s", av[i]);
}
else {
slurm_error ("renice: Invalid option: %s", av[i]);
}
}
if (!spank_remote (sp))
slurm_verbose ("renice: min_prio = %d", min_prio);
return (0);
}
int slurm_spank_task_post_fork (spank_t sp, int ac, char **av)
{
pid_t pid;
int taskid;
if (prio == PRIO_NOT_SET) {
/*
* See if SLURM_RENICE env var is set by user
*/
char val [1024];
if (spank_getenv (sp, PRIO_ENV_VAR, val, 1024) != ESPANK_SUCCESS)
return (0);
if (_str2prio (val, &prio) < 0) {
slurm_error ("Bad value for %s: %s", PRIO_ENV_VAR, optarg);
return (-1);
}
if (prio < min_prio)
slurm_error ("%s=%d not allowed, using min=%d",
PRIO_ENV_VAR, prio, min_prio);
}
if (prio < min_prio)
prio = min_prio;
spank_get_item (sp, S_TASK_GLOBAL_ID, &taskid);
spank_get_item (sp, S_TASK_PID, &pid);
slurm_info ("re-nicing task%d pid %ld to %ld", taskid, pid, prio);
if (setpriority (PRIO_PROCESS, (int) pid, (int) prio) < 0) {
slurm_error ("setpriority: %m");
return (-1);
}
return (0);
}
static int _str2prio (const char *str, int *p2int)
{
long int l;
char *p;
l = strtol (str, &p, 10);
if ((*p != ’ ’) || (l < -20) || (l > 20))
return (-1);
*p2int = (int) l;
return (0);
}
static int _renice_opt_process (int val, const char *optarg, int remote)
{
if (optarg == NULL) {
slurm_error ("renice: invalid argument!");
return (-1);
}
if (_str2prio (optarg, &prio) < 0) {
slurm_error ("Bad value for --renice: %s", optarg);
return (-1);
}
if (prio < min_prio)
slurm_error ("--renice=%d not allowed, will use min=%d",
prio, min_prio);
return (0);
}
COPYING
Copyright (C) 2006 The Regents of the University of California.
Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
CODE-OCEC-09-009. All rights reserved.
This file is part of SLURM, a resource management program. For
details, see <https://computing.llnl.gov/linux/slurm/>.
SLURM is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free
Software Foundation; either version 2 of the License, or (at your
option) any later version.
SLURM is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
for more details.
FILES
/etc/slurm/slurm.conf - SLURM configuration file.
/etc/slurm/plugstack.conf - SPANK configuration file.
/usr/include/slurm/spank.h - SPANK header file.
SEE ALSO
srun(1), slurm.conf(5)