MPI_Comm_spawn - Spawn a dynamic MPI process

NAME

       MPI_Comm_spawn -  Spawn a dynamic MPI process

SYNOPSIS

       #include <mpi.h>
       int
       MPI_Comm_spawn(char* command, char** argv, int maxprocs, MPI_Info info,
                      int root, MPI_Comm comm, MPI_Comm *intercomm,
                      int *errcodes)

INPUT PARAMETERS

       command
              - Name of program to spawn (only significant at root)
       argv   - arguments to command (only significant at root)
       maxprocs
              - max number of processes to start (only significant at root)
       info   - startup hints
       root   - rank of process to perform the spawn
       comm   - parent intracommunicator

OUTPUT PARAMETERS

       intercomm
              - child intercommunicator containing spawned processes
       errcodes
              - one code per process

DESCRIPTION

       A  group  of  processes  can  create  another  group  of processes with
       MPI_Comm_spawn .  This function is  a  collective  operation  over  the
       parent   communicator.    The  child  group  starts  up  like  any  MPI
       application.  The processes must begin  by  calling  MPI_Init  ,  after
       which the pre-defined communicator, MPI_COMM_WORLD , may be used.  This
       world communicator contains only the child processes.  It  is  distinct
       from the MPI_COMM_WORLD of the parent processes.

       MPI_Comm_spawn_multiple   is  used  to  manually  specify  a  group  of
       different executables and arguments to spawn.  MPI_Comm_spawn  is  used
       to  specify  one  executable  and  set of arguments (although a LAM/MPI
       appschema(5) can be provided to MPI_Comm_spawn via the "lam_file"  info
       key).

       Communication With Spawned Processes

       The   natural   communication  mechanism  between  two  groups  is  the
       intercommunicator.  The second communicator argument to  MPI_Comm_spawn
       returns  an  intercommunicator  whose  local  group contains the parent
       processes (same as the first communicator argument)  and  whose  remote
       group contains child processes. The child processes can access the same
       intercommunicator by using the MPI_Comm_get_parent  call.   The  remote
       group  size  of  the  parent  communicator  is  zero if the process was
       created by mpirun (1) instead of one  of  the  spawn  functions.   Both
       groups   can   decide   to   merge   the   intercommunicator   into  an
       intracommunicator (with  the  MPI_Intercomm_merge  function)  and  take
       advantage  of  other  MPI collective operations.  They can then use the
       merged intracommunicator to create new communicators  and  reach  other
       processes in the MPI application.

       Resource Allocation

       LAM/MPI  offers  some  MPI_Info  keys  for  the  placement  of  spawned
       applications.  Keys are looked for in  the  order  listed  below.   The
       first key that is found is used; any remaining keys are ignored.

       lam_spawn_file

       The  value  of  this  key can be the filename of an appschema(1).  This
       allows the programmer to specify an arbitrary set of LAM CPUs or  nodes
       to spawn MPI processes on.  In this case, only the appschema is used to
       spawn the application; command , argv , and maxprocs  are  all  ignored
       (even  at  the  root).   Note  that  even  though  maxprocs is ignored,
       errcodes must still be an array long enough to hold  an  integer  error
       code  for  every  process  that tried to launch, or be the MPI constant
       MPI_ERRCODES_IGNORE .  Also note that MPI_Comm_spawn_multiple does  not
       accept  the  "lam_spawn_file"  info key.  As such, the "lam_spawn_file"
       info  key  to  MPI_Comm_spawn  is  mainly  intended   to   spawn   MPMD
       applications and/or specify an arbitrary number of nodes to run on.

       Also  note  that this "lam_spawn_file" key is not portable to other MPI
       implementations; it is a  LAM/MPI-specific  info  key.   If  specifying
       exact  LAM  nodes  or  CPUs is not necessary, users should probably use
       MPI_Comm_spawn_multiple to make their program more portable.

       file

       This key is a synonym for "lam_spawn_file".  Since "file" is not a LAM-
       specific  name, yet this key carries a LAM-specific meaning, its use is
       deprecated in favor of "lam_spawn_file".

       lam_spawn_sched_round_robin

       The value of this key is a string representing a LAM CPU or node (using
       standard  LAM nomenclature -- see mpirun(1)) to begin spawning on.  The
       use of this key allows the programmer to indicate  which  node/CPU  for
       LAM  to  start  spawning on without having to write out a temporary app
       schema file.

       The CPU number is relative to the  boot  schema  given  to  lamboot(1).
       Only  a single LAM node/CPU may be specified, such as "n3" or "c1".  If
       a node is specified, LAM will spawn one MPI process per node.  If a CPU
       is  specified,  LAM  will scedule one MPI process per CPU.  An error is
       returned if "N" or "C" is used.

       Note that LAM is not involved  with  run-time  scheduling  of  the  MPI
       process -- LAM only spawns processes on indicated nodes.  The operating
       system schedules these processes for executation just  like  any  other
       process.   No attempt is made by LAM to bind processes to CPUs.  Hence,
       the "cX" nomenclature is just a convenicence mechanism to inidicate how
       many  MPI  processes  should  be  spawned  on  a  given node; it is not
       indicative of operating system scheduling.

       For "nX" values, the first MPI process will be spawned on the indicated
       node.   The  remaining  (maxprocs - 1) MPI processes will be spawned on
       successive nodes.  Specifically, if X  is  the  starting  node  number,
       process  i will be launched on "nK", where K = ((X + i) % total_nodes).
       LAM will modulus the node number with the total number of nodes in  the
       current LAM universe to prevent errors, thereby creating a "wraparound"
       effect.  Hence, this mechanism can be used for round-robin  scheduling,
       regardless of how many nodes are in the LAM universe.

       For "cX" values, the algorithm is essentially the same, except that LAM
       will resolve "cX" to a specific node before  spawning,  and  successive
       processes  are  spawned on the node where "cK" resides, where K = ((X +
       i) % total_cpus).

       For example, if there are 8 nodes  and  16  CPUs  in  the  current  LAM
       universe  (2  CPUs  per  node),  a "lam_spawn_sched_round_robin" key is
       given with the value of "c14", and maxprocs is 4, LAM will spawn MPI

PROCESSES ON

       CPU  Node  MPI_COMM_WORLD rank
       ---  ----  -------------------
       c14  n7    0
       c15  n7    1
       c0   n0    2
       c1   n0    3

       lam_no_root_node_schedule

       This key is used to designate that the spawned processes  must  not  be
       spawned  or  scheduled  on  the "root node" (the node doing the spawn).
       There is no specific value associated with this key, but it  should  be
       given some non-null/non-empty dummy value.

       It is a node-specific key and not a CPU-specific one. Hence if the root
       node has multiple CPUs, none of the CPUs on this root  node  will  take
       part in the scheduling of the spawned processes.

       No keys given

       If  none  of  the  info  keys  listed  above  are  used,  the  value of
       MPI_INFO_NULL should be given for info (all  other  keys  are  ignored,
       anyway  - there is no harm in providing other keys).  In this case, LAM
       schedules the given number of processes onto LAM nodes by starting with
       CPU  0  (or the lowest numbered CPU), and continuing through higher CPU
       numbers, placing one process on each CPU.   If  the  process  count  is
       greater than the CPU count, the procedure repeats.

       Predefined Attributes

       The  pre-defined  attribute on MPI_COMM_WORLD , MPI_UNIVERSE_SIZE , can
       be useful in determining how  many  CPUs  are  currently  unused.   For
       example,  the value in MPI_UNIVERSE_SIZE is the number of CPUs that LAM
       was  booted  with  (see  MPI_Init(1)).    Subtracting   the   size   of
       MPI_COMM_WORLD  from  this  value  returns  the  number  of CPUs in the
       current LAM universe that the current application is not using (and are
       therefore likely not being used).

       Process Terminiation

       Note    that   the   process[es]   spawned   by   MPI_COMM_SPAWN   (and
       MPI_COMM_SPAWN_MULTIPLE ) effectively become  orphans.   That  is,  the
       spawnning  MPI application does not wait for the spawned application to
       finish.  Hence, there is  no  guarantee  the  spawned  application  has
       finished  when the spawning completes.  Similarly, killing the spawning
       application will also have no effect on the spawned application.

       User applications can effect this kind  of  behavior  with  MPI_BARRIER
       between the spawning and spawned processed before MPI_FINALIZE .

       Note that lamclean will kill *all* MPI processes.

       Process Count

       The  maxprocs parameter to MPI_Comm_spawn specifies the exact number of
       processes to be started.  If it is not possible to  start  the  desired
       number  of  processes,  MPI_Comm_spawn will return an error code.  Note
       that even though maxprocs is only relevant on the root, all ranks  must
       have  an errcodes array long enough to handle an integer error code for
       every  process  that  tries   to   launch,   or   give   MPI   constant
       MPI_ERRCODES_IGNORE  for  the errcodes argument.  While this appears to
       be a contradiction, it is per the MPI-2 standard.  :-\

       Frequently, an application wishes to chooses a process count so  as  to
       fill  all  processors  available  to  a job.  MPI indicates the maximum
       number of processes recommended for a job in the pre-defined attribute,
       MPI_UNIVERSE_SIZE , which is cached on MPI_COMM_WORLD .

       The  typical  usage  is to subtract the value of MPI_UNIVERSE_SIZE from
       the number of processes currently in the job and spawn the  difference.
       LAM  sets  MPI_UNIVERSE_SIZE  to  the  number of CPUs in the user's LAM
       session (as defined in the boot schema [bhost(5)] via lamboot (1)).

       See MPI_Init(3) for other pre-defined attributes that are helpful  when
       spawning.

       Locating an Executable Program

       The  executable  program  file must be located on the node(s) where the
       process(es) will run.  On any node, the directories  specified  by  the
       user's PATH environment variable are searched to find the program.

       All  MPI  runtime  options  selected  by  mpirun  (1)  in  the  initial
       application launch remain in effect for all child processes created  by
       the spawn functions.

       Command-line Arguments

       The  argv  parameter  to  MPI_Comm_spawn should not contain the program
       name since it is given in the first parameter.  The command  line  that
       is  passed  to  the  newly  launched  program  will be the program name
       followed by the strings in argv .

USAGE WITH IMPI EXTENSIONS

       The IMPI standard only supports MPI-1 functions.  Hence, this  function
       is currently not designed to operate within an IMPI job.

ERRORS

       If an error occurs in an MPI function, the current MPI error handler is
       called to handle it.  By default, this error  handler  aborts  the  MPI
       job.   The  error  handler may be changed with MPI_Errhandler_set ; the
       predefined error handler MPI_ERRORS_RETURN may be used to  cause  error
       values  to  be  returned  (in C and Fortran; this error handler is less
       useful in with the C++ MPI  bindings.   The  predefined  error  handler
       MPI::ERRORS_THROW_EXCEPTIONS  should  be used in C++ if the error value
       needs to be recovered).  Note that MPI does not guarantee that  an  MPI
       program can continue past an error.

       All  MPI  routines  (except  MPI_Wtime  and MPI_Wtick ) return an error
       value; C routines as the value of the function and Fortran routines  in
       the  last  argument.   The  C++  bindings  for  MPI do not return error
       values; instead, error values are communicated by  throwing  exceptions
       of  type  MPI::Exception  (but  not  by  default).  Exceptions are only
       thrown if the error value is not MPI::SUCCESS .

       Note that if the MPI::ERRORS_RETURN handler is set in  C++,  while  MPI
       functions  will  return  upon an error, there will be no way to recover
       what the actual error value was.
       MPI_SUCCESS
              - No error; MPI routine completed successfully.
       MPI_ERR_COMM
              - Invalid communicator.   A  common  error  is  to  use  a  null
              communicator in a call (not even allowed in MPI_Comm_rank ).
       MPI_ERR_SPAWN
              -  Spawn error; one or more of the applications attempting to be
              launched failed.  Check the returned error code array.
       MPI_ERR_ARG
              - Invalid  argument.   Some  argument  is  invalid  and  is  not
              identified  by a specific error class.  This is typically a NULL
              pointer or other such error.
       MPI_ERR_ROOT
              - Invalid root.  The root must be specified as  a  rank  in  the
              communicator.   Ranks  must  be between zero and the size of the
              communicator minus one.
       MPI_ERR_OTHER
              - Other error; use  MPI_Error_string  to  get  more  information
              about this error code.
       MPI_ERR_INTERN
              -  An  internal error has been detected.  This is fatal.  Please
              send a bug report to the LAM mailing list  (see  http://www.lam-
              mpi.org/contact.php ).
       MPI_ERR_NO_MEM
              -  This  error  class  is  associated  with  an  error code that
              indicates that free space is exhausted.

MORE INFORMATION

       For more information, please see the official MPI Forum web site, which
       contains the text  of  both  the  MPI-1  and  MPI-2  standards.   These
       documents contain detailed information about each MPI function (most of
       which is not duplicated in these man pages).

       http://www.mpi-forum.org/

ACKNOWLEDGEMENTS

       The LAM Team would like the thank the MPICH Team for the handy  program
       to        generate        man        pages        ("doctext"       from
       ftp://ftp.mcs.anl.gov/pub/sowing/sowing.tar.gz    ),    the     initial
       formatting, and some initial text for most of the MPI-1 man pages.

LOCATION

       spawn.c

NAME

SYNOPSIS

INPUT PARAMETERS

OUTPUT PARAMETERS

DESCRIPTION

PROCESSES ON

USAGE WITH IMPI EXTENSIONS

ERRORS

SEE ALSO

MORE INFORMATION

ACKNOWLEDGEMENTS

LOCATION