Man Linux: Main Page and Category List

NAME

       sge_pe - Sun Grid Engine parallel environment configuration file format

DESCRIPTION

       Parallel environments are parallel programming and runtime environments
       allowing  for  the  execution  of  shared  memory or distributed memory
       parallelized applications. Parallel environments usually  require  some
       kind  of setup to be operational before starting parallel applications.
       Examples for common parallel environments are  shared  memory  parallel
       operating  systems  and  the  distributed  memory environments Parallel
       Virtual Machine (PVM) or Message Passing Interface (MPI).

       sge_pe allows for the definition of interfaces  to  arbitrary  parallel
       environments.   Once a parallel environment is defined or modified with
       the -ap or -mp options to qconf(1) and linked with one or  more  queues
       via pe_list in queue_conf(5) the environment can be requested for a job
       via the -pe switch to qsub(1) together with a request of  a  range  for
       the number of parallel processes to be allocated by the job. Additional
       -l options may be used  to  specify  the  job  requirement  to  further
       detail.

       Note,  Sun Grid Engine allows backslashes (\) be used to escape newline
       (\newline) characters. The backslash and the newline are replaced  with
       a space (" ") character before any interpretation.

FORMAT

       The format of a sge_pe file is defined as follows:

   pe_name
       The  name  of  the  parallel  environment  as  defined  for  pe_name in
       sge_types(1).  To be used in the qsub(1) -pe switch.

   slots
       The number of parallel processes being allowed to run  in  total  under
       the  parallel  environment  concurrently.  Type is number, valid values
       are 0 to 9999999.

   user_lists
       A comma separated list of user access list names (see  access_list(5)).
       Each  user  contained  in at least one of the enlisted access lists has
       access to the parallel environment. If the user_lists parameter is  set
       to NONE (the default) any user has access being not explicitly excluded
       via the xuser_lists parameter described below.  If a user is  contained
       both  in an access list enlisted in xuser_lists and user_lists the user
       is denied access to the parallel environment.

   xuser_lists
       The xuser_lists parameter contains a comma separated list of so  called
       user  access lists as described in access_list(5).  Each user contained
       in at least one of the enlisted access lists is not allowed  to  access
       the  parallel  environment. If the xuser_lists parameter is set to NONE
       (the default) any user has access. If a user is contained  both  in  an
       access  list  enlisted in xuser_lists and user_lists the user is denied
       access to the parallel environment.

   start_proc_args
       The invocation command line of a start-up procedure  for  the  parallel
       environment. The start-up procedure is invoked by sge_shepherd(8) prior
       to executing the job script. Its  purpose  is  to  setup  the  parallel
       environment  correspondingly  to its needs.  An optional prefix "user@"
       specifies the user under which this procedure is to  be  started.   The
       standard  output  of  the  start-up procedure is redirected to the file
       REQNAME.poJID in  the  job's  working  directory  (see  qsub(1)),  with
       REQNAME  being  the  name  of  the job as displayed by qstat(1) and JID
       being the job's identification number.  Likewise,  the  standard  error
       output is redirected to REQNAME.peJID
       The  following  special variables being expanded at runtime can be used
       (besides any other strings which have to be interpreted  by  the  start
       and stop procedures) to constitute a command line:

       $pe_hostfile
              The  pathname of a file containing a detailed description of the
              layout of the parallel environment to be setup by  the  start-up
              procedure.  Each  line  of  the  file  refers to a host on which
              parallel processes are to be run. The first entry of  each  line
              denotes  the  hostname,  the second entry the number of parallel
              processes to be run on the host, the third entry the name of the
              queue, and the fourth entry a processor range to be used in case
              of a multiprocessor machine.

       $host  The name of the host on which the start-up  or  stop  procedures
              are started.

       $job_owner
              The user name of the job owner.

       $job_id
              Sun Grid Engine's unique job identification number.

       $job_name
              The name of the job.

       $pe    The name of the parallel environment in use.

       $pe_slots
              Number of slots granted for the job.

       $processors
              The  processors  string  as contained in the queue configuration
              (see queue_conf(5)) of the master queue (the queue in which  the
              start-up and stop procedures are started).

       $queue The cluster queue of the master queue instance.

   stop_proc_args
       The  invocation  command  line of a shutdown procedure for the parallel
       environment. The shutdown procedure is invoked by sge_shepherd(8) after
       the  job  script  has  finished.  Its  purpose  is to stop the parallel
       environment and to  remove  it  from  all  participating  systems.   An
       optional  prefix  "user@" specifies the user under which this procedure
       is to be started.  The standard output of the stop  procedure  is  also
       redirected  to  the  file  REQNAME.poJID in the job's working directory
       (see qsub(1)), with REQNAME being the name of the job as  displayed  by
       qstat(1)  and JID being the job's identification number.  Likewise, the
       standard error output is redirected to REQNAME.peJID
       The same special variables  as  for  start_proc_args  can  be  used  to
       constitute a command line.

   allocation_rule
       The  allocation  rule  is interpreted by the scheduler thread and helps
       the scheduler to decide how to distribute parallel processes among  the
       available  machines.  If, for instance, a parallel environment is built
       for shared memory applications only, all parallel processes have to  be
       assigned  to a single machine, no matter how much suitable machines are
       available.   If,  however,  the  parallel   environment   follows   the
       distributed  memory  paradigm,  an even distribution of processes among
       machines may be favorable.
       The current version of the scheduler  only  understands  the  following
       allocation rules:

       <int>:    An integer number fixing the number of processes per host. If
                 the number is 1, all processes have to  reside  on  different
                 hosts. If the special denominator $pe_slots is used, the full
                 range of processes as specified with the qsub(1)  -pe  switch
                 has  to  be allocated on a single host (no matter which value
                 belonging to the range is finally chosen for the  job  to  be
                 allocated).

       $fill_up: Starting  from  the  best  suitable host/queue, all available
                 slots are allocated. Further hosts and queues are "filled up"
                 as long as a job still requires slots for parallel tasks.

       $round_robin:
                 From  all suitable hosts a single slot is allocated until all
                 tasks requested by the parallel job are dispatched.  If  more
                 tasks are requested than suitable hosts are found, allocation
                 starts again from the  first  host.   The  allocation  scheme
                 walks  through suitable hosts in a best-suitable-first order.

   control_slaves
       This parameter can be set to TRUE or FALSE (the default). It  indicates
       whether Sun Grid Engine is the creator of the slave tasks of a parallel
       application via sge_execd(8) and  sge_shepherd(8)  and  thus  has  full
       control  over  all  processes  in a parallel application, which enables
       capabilities  such  as  resource  limitation  and  correct  accounting.
       However,   to   gain  control  over  the  slave  tasks  of  a  parallel
       application, a sophisticated PE  interface  is  required,  which  works
       closely  together  with  Sun Grid Engine facilities. Such PE interfaces
       are available through your local Sun Grid Engine support office.

       Please set the control_slaves parameter  to  false  for  all  other  PE
       interfaces.

   job_is_first_task
       The job_is_first_task parameter can be set to TRUE or FALSE. A value of
       TRUE indicates that the Sun Grid Engine job script already contains one
       of  the tasks of the parallel application (the number of slots reserved
       for the job is the number of slots  requested  with  the  -pe  switch),
       while  a  value  of  FALSE indicates that the job script (and its child
       processes) is not part of the parallel program  (the  number  of  slots
       reserved  for  the  job  is  the number of slots requested with the -pe
       switch + 1).

       If  wallclock  accounting  is  used  (execd_params  ACCT_RESERVED_USAGE
       and/or  SHARETREE_RESERVED_USAGE set to TRUE) and control_slaves is set
       to FALSE, the job_is_first_task parameter influences the accounting for
       the  job:  A  value of TRUE means that accounting for cpu and requested
       memory gets multiplied by the number of slots requested  with  the  -pe
       switch,   if   job_is_first_task   is  set  to  FALSE,  the  accounting
       information gets multiplied by number of slots + 1.

   urgency_slots
       For pending jobs with a slot range PE request the number  of  slots  is
       not  determined.  This  setting  specifies the method to be used by Sun
       Grid Engine to assess the number of slots such jobs might finally  get.

       The  assumed  slot  allocation  has  a  meaning  when  determining  the
       resource-request-based priority contribution for numeric  resources  as
       described  in  sge_priority(5)  and  is  displayed when qstat(1) is run
       without -g t option.

       The following methods are supported:

       <int>:    The specified integer number is directly used as  prospective
                 slot amount.

       min:      The slot range minimum is used as prospective slot amount. If
                 no lower bound is specified with the range 1 is assumed.

       max:      The of the slot range maximum is  used  as  prospective  slot
                 amount.   If  no  upper bound is specified with the range the
                 absolute maximum possible due to the PE's  slots  setting  is
                 assumed.

       avg:      The  average  of  all  numbers  occurring within the job's PE
                 range request is assumed.

   accounting_summary
       This parameter is only checked if control_slaves (see above) is set  to
       TRUE  and  thus  Sun Grid Engine is the creator of the slave tasks of a
       parallel application via sge_execd(8)  and  sge_shepherd(8).   In  this
       case,  accounting  information is available for every single slave task
       started by Sun Grid Engine.

       The accounting_summary parameter can be set to TRUE or FALSE.  A  value
       of  TRUE  indicates  that only a single accounting record is written to
       the accounting(5) file, containing the accounting summary of the  whole
       job  including  all  slave  tasks,  while a value of FALSE indicates an
       individual accounting(5) record is written for  every  slave  task,  as
       well as for the master task.
       Note:     When     running     tightly     integrated     jobs     with
       SHARETREE_RESERVED_USAGE  set,  and  with   having   accounting_summary
       enabled  in  the  parallel  environment,  reserved  usage  will only be
       reported by the master task of the parallel job.  No per parallel  task
       usage   records   will  be  sent  from  execd  to  qmaster,  which  can
       significantly  reduce  load  on  qmaster  when  running  large  tightly
       integrated parallel jobs.

RESTRICTIONS

       Note,  that  the  functionality of the start-up, shutdown and signaling
       procedures  remains  the  full  responsibility  of  the   administrator
       configuring the parallel environment.  Sun Grid Engine will just invoke
       these procedures and evaluate their exit status. If the  procedures  do
       not  perform their tasks properly or if the parallel environment or the
       parallel application behave unexpectedly, Sun Grid Engine has no  means
       to detect this.

SEE ALSO

       sge_intro(1),   sge__types(1),  qconf(1),  qdel(1),  qmod(1),  qsub(1),
       access_list(5), sge_qmaster(8), sge_shepherd(8).

COPYRIGHT

       See sge_intro(1) for a full statement of rights and permissions.