Man Linux: Main Page and Category List

NAME

       LAM SSI boot - overview of LAM's boot SSI modules

DESCRIPTION

       The  "kind"  for  boot SSI modules is "boot".  Specifically, the string
       "boot" (without the quotes) is the prefix  that  can  be  used  as  the
       prefix  to  arguments  when passing values to boot modules at run time.
       For example:

       lamboot -ssi boot rsh hostfile
           Specifies to use the "rsh" boot module, and lamboot across all  the
           nodes listed in the file hostfile.

       LAM  currently  has  several  boot  modules:  bproc, globus, rsh (which
       includes ssh), slurm, and tm.

ADDITIONAL INFORMATION

       The LAM/MPI User's Guide contains much detail about  all  of  the  boot
       modules.   All users are strongly encouraged to read it.  This man page
       is a summary of the available information.

SELECTING A BOOT MODULE

       Only one boot module may be selected per command execution.  Hence, the
       selection of which module occurs once when a given command initializes.
       Once the module is chosen, it is used for the duration of  the  program
       run.

       In  most cases, LAM will automatically select the "best" module at run-
       time.  LAM will query all available modules at run  time  to  obtain  a
       list of priorities.  The module with the highest priority will be used.
       If multiple modules return the same priority, LAM will  select  one  at
       random.   Priorities  are  in  the  range of 0 to 100, with 0 being the
       lowest priority and 100 being the highest.  At run  time,  each  module
       will  examine the run-time environment and return a priority value that
       is appropriate.

       For example, when running a PBS  job,  the  tm  module  will  return  a
       sufficiently  high priority value such that it will be selected and the
       other available modules will not.

       Most modules allow run time parameters to override the priorities  that
       they  return  that  allow  changing  the  order (and therefore ultimate
       selection) of the available boot modules.  See below.

       Alternatively, a specific  module  may  be  selected  by  the  user  by
       specifying  a  value  for  the  boot  parameter  (either by environment
       variable or by the -ssi command line  parameter).   In  this  case,  no
       other  modules  will  be queried by LAM.  If the named module returns a
       valid priority, it will be used.  For example:

       lamboot -ssi boot rsh hostfile
           Tells LAM to only query the rsh  boot  module  and  see  if  it  is
           available to run.

       If  the boot module that is selected is unable to run (e.g., attempting
       to use the  tm  boot  module  when  not  running  in  a  PBS  job),  an
       appropriate error message will be printed and execution will abort.

AVAILABLE MODULES

       As with all SSI modules, it is possible to pass parameters at run time.
       This section discusses the built-in LAM boot modules, as  well  as  the
       run-time parameters that they accept.

       In  the  discussion  below, parameters to boot modules are discussed in
       terms of name and value.  The  name  and  value  may  be  specified  as
       command  line  arguments  to  the  lamboot, lamgrow, recon, and lamwipe
       commands with the -ssi switch,  or  they  may  be  set  in  environment
       variables of the form LAM_MPI_SSI_name=value.  Note that using the -ssi
       command line switch  will  take  precendence  over  any  previously-set
       environment variables.

   bproc Boot Module
       The  bproc  boot  module  uses  native  bproc  functionality (e.g., the
       bproc_execmove library call) to launch jobs on slaves  nodes  from  the
       head  node.   Checks are made before launching to ensure that the nodes
       are available and are "owned" by the  user  and/or  the  user's  group.
       Appropriate  error  messages will be displayed if the user is unable to
       execute on the target nodes.

       Hostnames should be specified using bproc notation:  -1  indicates  the
       head  node,  and integer numbers starting with 0 represent slave nodes.
       The string "localhost" will automatically be converted to "-1".

       The  default  behavior  is  to  mark  the  bproc  head  node  as  "non-
       scheduledable",  meaning  that  the  expansion of "N" and "C" when used
       with mpirun and lamexec will exclude the bproc head node.  For example,
       "mpirun  C  my_mpi_program"  will  run  copies of my_mpi_program on all
       lambooted slave nodes, but not the bproc head node.

       Note that the bproc boot module is only  usable  from  the  bproc  head
       node.

       The bproc boot module only has one tunable parameter:

       boot_bproc_priority
           Using  the  priority argument can override LAM's automatic run-time
           boot module selection algorithms.  This parameter only  has  effect
           when  the  tm module is eligible to be run (i.e., when running on a
           bproc cluster).

       See the bproc notes in the user documentation for more details.

   globus Boot Module
       The globus boot  module  uses  the  globus-job-run  command  to  launch
       executables  on remote nodes.  It is currently limited to only allowing
       jobs that can use the fork job manager on the Globus gatekeeper.  Other
       job managers are not yet supported.

       LAM  will  effectively  never  select the globus boot module by default
       because it has an extremely low default priority; it must  be  manually
       selected  with  the  boot  SSI  parameter  or have its priority raised.
       Additionally, LAM must be able to find the  globus-job-run  command  in
       your PATH.

       The  boot  schema  requires  hosts  to  be listed as the Globus contact
       string.  For example:

       "host1:port1:/O=xxx/OU=yyy/CN=aaa bbb ccc"

       Note the use of quotes because the CN includes  spaces  --  the  entire
       contact  name  must be enclosed in quotes.  Additionally, since globus-
       job-run does not invoke the user's "dot" files on the remote nodes,  no
       PATH  or  environment  is setup.  Hence, the attribute lam_install_path
       must be specified for each contact string in the hostfile so  that  LAM
       knows where to find its executables on the remote nodes.  For example:

       "host1:port1:/O=xxx/OU=yyy/CN=aaa bbb ccc" lam_install_path=/home/lam

       The globus boot module only has one tunable parameter:

       boot_globus_priority
           Using  the  priority argument can override LAM's automatic run-time
           boot module selection algorithms.

   rsh Boot Module
       The rsh boot module uses rsh or ssh (or any other  command  line  agent
       that  acts  like  rsh/ssh)  to  launch executables on remote nodes.  It
       requires that executables can be started on remote nodes without  being
       prompted for a password, and without outputting anything to stderr.

       The  rsh boot module is always available, and unless overridden, always
       assigns itself a priority of 0.

       The rsh module accepts a few run-time parameters:

       boot_rsh_agent
           Used to override the compiled-in default remote agent program  that
           was selected when LAM is compiled.  For example, this parameter can
           be set to use "ssh" if LAM was compiled to use  "rsh"  by  default.
           Previous  versions  of LAM/MPI used the LAMRSH environment variable
           for this purpose.  While  the  LAMRSH  environment  variable  still
           works,  its  use  is  deprecated in favor of the boot_rsh_agent SSI
           module argument.

       boot_rsh_priority
           Using the priority argument can override LAM's  automatic  run-time
           boot module selection algorithms.

       boot_rsh_username
           If  the  user  has a different username on the remote machine, this
           parameter can be used to pass the -l  argument  to  the  underlying
           remote  agent.   Note that this is a coarse-grained control -- this
           one username will be used for all  remote  nodes.   If  more  fine-
           grained  control  is  required, the username should be specified in
           the boot schema file on a per-host basis.

   slurm Boot Module
       The slurm boot module uses the srun command to launch the  LAM  daemons
       in  a  SLURM execution environment (i.e., it detects that it is running
       under SLURM and automatically sets its priority to 50).  It can be used
       in two different modes: batch (where a script is submitted to SLURM and
       it is run on the first node in the node allocation) and allocate (where
       the  -A  option  is  used to srun to obtain an interactive allocation).
       The slurm boot module does not support running  in  a  script  that  is
       launched by SLURM on all nodes in an allocation.

       No  boot  schema file is required when using the slurm boot module; LAM
       will automatically determine the host and CPU count from SLURM  itself.

       The slurm boot module only has one tunable parameter:

       boot_slurm_priority
           Using  the  priority argument can override LAM's automatic run-time
           boot module selection algorithms.  This parameter only  has  effect
           when  the slurm module is eligible to be run (i.e., when running in
           a SLURM allocation).

   tm Boot Module
       The tm boot module uses the Task Management (TM)  interface  to  launch
       executables  on  remote  nodes.  Currently, only OpenPBS and PBSPro are
       the only two systems that implement the TM interface.  Hence, when  LAM
       detects  that it is running in a PBS job, it will automatically set the
       tm priority to 50.  When not running in a PBS job, the tm  module  will
       not be available.

       The tm boot module only has one tunable parameter:

       boot_tm_priority
           Using  the  priority argument can override LAM's automatic run-time
           boot module selection algorithms.  This parameter only  has  effect
           when  the  tm module is eligible to be run (i.e., when running in a
           PBS job).

SEE ALSO

       lamssi(7), mpirun(1), LAM User's Guide