Man Linux: Main Page and Category List

Name

       condor_glidein - add a remote grid resource to a local Condor pool

Synopsis

       condor_glidein [ -help ]

       condor_glidein  [  -admin  address  ]  [  -anybody ] [ -archdir dir ] [
       -basedir basedir ] [ -count CPU count ] [ <Execute Task  Options>  ]  [
       <Generate  File  Options>  ] [ -gsi_daemon_name cert_name ] [ -idletime
       minutes ] [ -install_gsi_trusted_ca_dir path ]  [  -install_gsi_gridmap
       file  ] [ -localdir dir ] [ -memory MBytes ] [ -project name ] [ -queue
       name ] [ -runtime minutes ] [ -runonly ] [ <Set Up Task  Options>  ]  [
       -suffix suffix ] [ -slots slot count ] <contact argument>

Description

       condor_glidein  allows  the  temporary addition of a grid resource to a
       local Condor pool. The  addition  is  accomplished  by  installing  and
       executing  some of the Condor daemons on the remote grid resource, such
       that it reports in as part of the local  Condor  pool.   condor_glidein
       accomplishes  two separate tasks: set up and execution. These separated
       tasks allow flexibility, in that the user may use condor_glidein to  do
       only one of the tasks or both, in addition to customizing the tasks.

       The set up task generates a script that may be used to start the Condor
       daemons during the execution task, places this  script  on  the  remote
       grid  resource,  composes  and  installs  a  configuration file, and it
       installs the condor_master , condor_startd and  condor_starter  daemons
       on the grid resource.

       The  execution  task  runs the script generated by the set up task. The
       goal of the script is to invoke the condor_master  daemon.  The  Condor
       job  glidein_startup  appears in the queue of the local Condor pool for
       each invocation of condor_glidein . To remove the  grid  resource  from
       the local Condor pool, use condor_rm to remove the glidein_startup job.

       The Condor jobs to do both the set up and execute tasks utilize Condor-
       G  and  Globus  protocols  (gt2  or gt4) to communicate with the remote
       resource. Therefore, an X.509 certificate (proxy) is required  for  the
       user running condor_glidein .

       Specify the remote grid machine with the command line argument <contact
       argument> .  <contact argument> takes one of 4 forms:

          1.  hostname

          2.  Globus contact string

          3.  hostname/jobmanager-<schedulername>

          4.   -contactfile  filename  The  argument   -contactfile   filename
          specifies the full path and file name of a file that contains Globus
          contact strings. Each of the resources given  by  a  Globus  contact
          string is added to the local Condor pool.

       The  set  up task of condor_glidein copies the binaries for the correct
       platform from a central server. To obtain access to the server,  or  to
       set up your own server, follow instructions on the Glidein Server Setup
       page, at http://www.cs.wisc.edu/condor/glidein. Set  up  need  only  be
       done once per site, as the installation is never removed.

       By  default, all files installed on the remote grid resource are placed
       in  the  directory  $(HOME)/Condor_glidein.  $(HOME)is  evaluated   and
       defined  on the remote machine using a grid map. This directory must be
       in a shared file system accessible by all machines that  will  run  the
       Condor daemons. By default, the daemon’s log files will also be written
       in this directory. Change this directory with the -localdir  option  to
       make  Condor  daemons  write  to  local  scratch space on the execution
       machine. For debugging initial problems, it may be convenient  to  have
       the  log  files  in the more accessible default directory. If using the
       default  directory,  occasionally  clean  up  old   log   and   execute
       directories to avoid running out of space.

Examples

       To  have 10 grid resources running PBS at a grid site with a gatekeeper
       named gatekeeper.site.edu join the local Condor pool:

       % condor_glidein -count 10 gatekeeper.site.edu/jobmanager-pbs

       If you try something like the above and condor_glidein is not  able  to
       automatically  determine  everything  it needs to know about the remote
       site, it will ask you to provide more information. A typical result  of
       this process is something like the following command:

       % condor_glidein .br
           -count 10 .br
           -arch 6.6.7-i686-pc-Linux-2.4 .br
           -setup_jobmanager jobmanager-fork .br
           gatekeeper.site.edu/jobmanager-pbs

       The Condor jobs that do the set up and execute tasks will appear in the
       queue for the local Condor pool. As a result of a  successful  glidein,
       use condor_status to see that the remote grid resources are part of the
       local Condor pool.

       A list of common problems and solutions is  presented  in  this  manual
       page.

Generate File Options

       -genconfig

          Create  a  local  copy of the configuration file that may be used on
          the      remote      resource.      The      file      is      named
          glidein_condor_config.<suffix>.     The     string     defined    by
          <suffix>defaults to the  process  id  (PID)  of  the  condor_glidein
          process  or  is  defined  with  the -suffix command line option. The
          configuration file may be edited for later use with  the  -useconfig
          option.

       -genstartup

          Create  a  local  copy  of the script used on the remote resource to
          invoke    the    condor_master    .    The     file     is     named
          glidein_startup.<suffix>.  The string defined by <suffix>defaults to
          the process id (PID) of the condor_glidein  process  or  is  defined
          with  the  -suffix  command  line option. The file may be edited for
          later use with the -usestartup option.

       -gensubmit

          Generate submit description files, but do  not  submit.  The  submit
          description    file    for    the    set    up    task    is   named
          glidein_setup.submit.<suffix>. The submit description file  for  the
          execute   task  is  named  glidein_run.submit.<suffix>.  The  string
          defined  by  <suffix>defaults  to  the  process  id  (PID)  of   the
          condor_glidein  process  or is defined with the -suffix command line
          option.

Set Up Task Options

       -setuponly

          Do only the set up task of condor_glidein . This  option  cannot  be
          run simultaneously with -runonly .

       -setup_here

          Do the set up task on the local machine, instead of at a remote grid
          resource. This may be used, for example, to do the set  up  task  of
          condor_glidein in an AFS area that is read-only from the remote grid
          resource.

       -forcesetup

          During the set up task, force the copying of  files,  even  if  this
          overwrites  existing  files.  Use  this  to  push out changes to the
          configuration.

       -useconfig config_file

          The set up task copies the specified configuration file, rather than
          generating one.

       -usestartup startup_file

          The  set  up  task  copies the specified startup script, rather than
          generating one.

       -setup_jobmanager jobmanagername

          Identifies the jobmanager on the remote grid resource to receive the
          files  during  the  set  up  task.  If  a  reasonable default can be
          discovered through MDS,  this  is  optional.   jobmanagername  is  a
          string  representing  any  gt2 name for the job manager. The correct
          string in most cases will be jobmanager-fork . Other common  strings
          may  be  jobmanager  ,  jobmanager-condor  ,  jobmanager-pbs  ,  and
          jobmanager-lsf .

Execute Task Options

       -runonly

          Starts execution of the Condor daemons on the grid resource. If  any
          of  the  necessary  files or executables are missing, condor_glidein
          exits with an error code. This option cannot be  run  simultaneously
          with -setuponly .

       -run_here

          Runs condor_master directly rather than submitting a Condor job that
          causes the remote execution. To instead generate a script that  does
          this,  use  -run_here  in  combination with -gensubmit . This may be
          useful for running Condor daemons on resources that are not directly
          accessible by Condor.

Options

       -help

          Display brief usage information and exit.

       -basedir basedir

          Specifies  the  base  directory on the remote grid resource used for
          placing files. The default directory is $(HOME)/Condor_glideinon the
          grid resource.

       -archdir dir

          Specifies the directory on the remote grid resource for placement of
          the Condor executables. The default value for  -archdir  i  s  based
          upon  version  information  on  the grid resource. It is of the form
          <basedir>/<condor-version>-<Globus canonicalsystemname>. An  example
          of  the  directory  (without  the base directory) for Condor version
          6.1.13  running  on  a  Sun  Sparc  machine  with  Solaris  2.6   is
          6.1.13-sparc-sun-solaris-2.6.

       -localdir dir

          Specifies  the  directory  on  the  remote grid resource in which to
          create log and execution subdirectories needed by Condor. If limited
          disk  quota  in the home or base directory on the grid resource is a
          problem, set -localdir to a large temporary space,  such  as  /tmpor
          /scratch.  If the batch system requires invocation of Condor daemons
          in a temporary scratch directory, ’.’ may be used for the definition
          of the -localdir option.

       -arch architecture

          Identifies  the  platform  of  the  required  tarball containing the
          correct Condor daemon executables to  download  and  install.  If  a
          reasonable  default can be discovered through MDS, this is optional.
          A    list    of    possible    values    may     be     found     at
          http://www.cs.wisc.edu/condor/glidein/binaries.   The   architecture
          name is the same as the tarball name without the suffix  tar.gz.  An
          example is 6.6.5-i686-pc-Linux-2.4 .

       -queue name

          The  argument name is a string used at the grid resource to identify
          a job queue.

       -project name

          The argument name is a string used at the grid resource to  identify
          a project name.

       -memory MBytes

          The  maximum  memory  size  in  Megabytes  to  request from the grid
          resource.

       -count CPU count

          The number of CPUs requested to join the local pool. The default  is
          1.

       -slots slot count

          For  machines  with  multiple  CPUs,  the CPUs maybe divided up into
          slots.  slot count is the number of slots that results. By  default,
          Condor  divides multiple-CPU resources such that each CPU is a slot,
          each with an equal share of RAM, disk, and swap space.  This  option
          configures  the number of slots, so that multi-threaded jobs can run
          in a slot with multiple CPUs. For example, if 4 CPUs  are  requested
          and  -slots is not specified, Condor will divide the request up into
          4 slots with 1 CPU each. However, if -slots 2 is  specified,  Condor
          will  divide  the  request  up into 2 slots with 2 CPUs each, and if
          -slots 1 is specified, Condor will put all 4 CPUs into one slot.

       -idletime minutes

          The amount of time that a remote  grid  resource  will  remain  idle
          state,  before the daemons shut down. A value of 0 (zero) means that
          the daemons never shut down due to remaining in the idle  state.  In
          this  case,  the -runtime option defines when the daemons shut down.
          The default value is 20 minutes.

       -runtime minutes

          The maximum amount of time the Condor daemons  on  the  remote  grid
          resource  will  run  before shutting themselves down. This option is
          useful for  resources  with  enforced  maximum  run  times.  Setting
          -runtime  to  be a few minutes shorter than the enforced limit gives
          the daemons time to perform a graceful shut down.

       -anybody

          Sets the Condor STARTexpression for the added remote  grid  resource
          to  True.  This  permits  any  user’s job which can run on the added
          remote grid resource to run. Without this option, only jobs owned by
          the  user  executing  condor_glidein  can execute on the remote grid
          resource. WARNING: Using this option may violate the usage  policies
          of many institutions.

       -admin address

          Where  to send e-mail with problems. The default is the login of the
          user running condor_glidein at UID domain of the local Condor  pool.

       -suffix X

          Suffix to use when generating files. Default is process id.

       -gsi_daemon_name cert_name

          Includes and enables GSI authentication in the configuration for the
          remote grid resource. The argument is the GSI certificate name  that
          the daemons will use to authenticate themselves.

       -install_gsi_trusted_ca_dir path

          The  argument  identifies  the  directory  containing the trusted CA
          certificates that the daemons are to use  (for  example,  /etc/grid-
          security/certificates).  The  contents  of  this  directory  will be
          installed at  the  remote  site  in  the  directory  <basedir>/grid-
          security.

       -install_gsi_gridmap file

          The  argument  is  the  file name of the GSI-specific X.509 map file
          that the daemons will use. The file will be installed at the  remote
          site  in  <basedir>/grid-security. The file contains entries mapping
          certificates to user names. At the very least, it  must  contain  an
          entry   for   the  certificate  given  by  the  command-line  option
          -gsi_daemon_name  .   If  other   Condor   daemons   use   different
          certificates,  then  this  file will also list any certificates that
          the daemons will encounter for the condor_schedd ,  condor_collector
          , and condor_negotiator . See section for more information.

Exit Status

       condor_glidein  will exit with a status value of 0 (zero) upon complete
       success, or with non-zero values upon failure. The status value will be
       1  (one) if condor_glidein encountered an error making a directory, was
       unable to copy a tar file, encountered an error in parsing the  command
       line,  or was not able to gather required information. The status value
       will be 2 (two) if there was an error in the remote set up. The  status
       value will be 3 (three) if there was an error in remote submission. The
       status value will be -1 (negative one) if no resource was specified  in
       the command line.

       Common  problems are listed below. Many of these are best discovered by
       looking in the StartLoglog file on the remote grid resource.

       WARNING: The file xxx is not writable by condor

          This error occurs when condor_glidein is run  in  a  directory  that
          does  not have the proper permissions for Condor to access files. An
          AFS directory does not give Condor the user’s AFS ACLs.

       Glideins fail to run due to GLIBC errors

          Check     the     list     of     available     glidein     binaries
          (http://www.cs.wisc.edu/condor/glidein/binaries), and try specifying
          the architecture name that includes the correct  glibc  version  for
          the remote grid site.

       Glideins join pool but no jobs run on them

          One  common  cause of this problem is that the remote grid resources
          are in a different file system domain, and the submitted Condor jobs
          have  an  implicit  requirement  that they must run in the same file
          system domain. See  section  for  details  on  using  Condor’s  file
          transfer  capabilities  to solve this problem. Another cause of this
          problem is a communication failure. For example, a firewall  may  be
          preventing  the  condor_negotiator or the condor_schedd daemons from
          connecting  to  the  condor_startd  on  the  remote  grid  resource.
          Although  work  is  being  done  to  remove  this requirement in the
          future,  it  is  currently  necessary  to  have  full  bidirectional
          connectivity,  at  least  over a restricted range of ports. See page
          for more information on configuring a port range.

       Glideins run but fail to join the pool

          This may be caused by the local pool’s security  settings  or  by  a
          communication failure. Check that the security settings in the local
          pool’s configuration file allow write  access  to  the  remote  grid
          resource.  To  not  modify the security settings for the pool, run a
          separate pool specifically for the remote grid  resources,  and  use
          flocking  to  balance jobs across the two pools of resources. If the
          log files indicate a communication failure, then see the next  item.

       The startd cannot connect to the collector

          This  may be caused by several things. One is a firewall. Another is
          when the compute nodes do not have  even  outgoing  network  access.
          Configuration  to  work  without full network access to and from the
          compute nodes is still in the experimental stages, so for  now,  the
          short  answer  is  that  you  must  at  least  have  a range of open
          (bidirectional) ports and set up the configuration file as described
          on   page   .  Use  the  option  -genconfig  ,  edit  the  generated
          configuration file, and then do the glidein execute  task  with  the
          option -useconfig .)

          Another  possible  cause  of connectivity problems may be the use of
          UDP   by   the   condor_startd   to   register   itself   with   the
          condor_collector . Force it to use TCP as described on page .

          Yet  another  possible  cause  of  connectivity problems is when the
          remote grid resources have more than one network interface, and  the
          default  one chosen by Condor is not the correct one. One way to fix
          this is to modify the glidein startup script using  the  -genstartup
          and  -usestartup  options.  The  script  needs  to  determine the IP
          address associated with the correct network  interface,  and  assign
          this to the environment variable _condor_NETWORK_INTERFACE.

       NFS file locking problems

          If  the  -localdir  option  uses  files on NFS (not recommended, but
          sometimes convenient for  testing),  the  Condor  daemons  may  have
          trouble  manipulating  file  locks. Try inserting the following into
          the configuration file:

       IGNORE_NFS_LOCK_ERRORS = True

Author

       Condor Team, University of Wisconsin-Madison

Copyright

       Copyright (C) 1990-2009  Condor  Team,  Computer  Sciences  Department,
       University  of  Wisconsin-Madison,  Madison,  WI.  All Rights Reserved.
       Licensed under the Apache License, Version 2.0.

       See      the      Condor      Version       7.2.4       Manual       or
       http://www.condorproject.org/licensefor   additional  notices.  condor-
       admin@cs.wisc.edu

                                     date     just-man-pages/condor_glidein(1)