NAME
bhost - LAM boot schema (host file) format
SYNTAX
#
# comments
#
<machine> [cpu=<cpucount>] [user=<userid>]
<machine> [cpu=<cpucount>] [user=<userid>]
...
DESCRIPTION
A boot schema describes the machines that will combine to form a
multicomputer running LAM. It is used by recon(1) to verify initial
conditions for running LAM, by lamboot(1) to start LAM, and by
lamhalt(1) to terminate LAM (note that lamwipe(1) has been deprecated
by the lamhalt(1) command).
The particular syntax of a LAM boot schema is sometimes called the
"host file" syntax. It is line oriented. One line indicates the name
of a machine, typically the full Internet domain name, an optional
number of CPUs available on that machine, and optionally the userid
with which to access it.
Common boot schema for a particular site may be created by the system
administrator and placed in the installation directory under etc/.
They typically start with the prefix bhost. Individual users usually
create their own boot schema, especially if the configurations are
simple.
NAME RESOLUTION
Note that lamboot resolves all names listed in bhost on the node in
which lamboot was invoked on. The lamboot(1) man page contains
information about address resolution, examples on how to handle
multiple network interface cards (NICs) in a node, etc.
EXAMPLE
Here is an example three node boot schema:
#
# example LAM host file
#
server.cluster.example.com schedule=no
beowulf1.cluster.example.com cpu=2
beowulf2.cluster.example.com
beowulf2.cluster.example.com
somewhere.else.example.com user=guest
Note that the "guest" ID is significant, since the user has an
alternate login ID on somewhere.else.example.com. Additionally note
that beowulf1 has a CPU count of 2 listed (a CPU count of 1 is assumed
if it is not given). This value is used by mpirun(1),
MPI_Comm_spawn(2), and MPI_Comm_spawn_multiple(2) for the "C" (or CPU)
notation that specifies how many ranks to start. This is particularly
useful for running on SMP machines.
Note the schedule=no clause. This means that LAM will boot a daemon on
that node, but by default, will not launch any MPI processes on that
node. This is handy for when you want to control your MPI applications
from one node (e.g., a server), but don't want to run any MPI
applications on it. In some environments this is the default (e.g.,
BProc). See the LAM User's Guide for more details.
beowulf2 is listed twice, but has no specific CPU count listed. In
this case, LAM will keep a running tally of the total number of CPUs
for that host. Hence, LAM will calculate that beowulf2 has two CPUs
available for use. Calculating the number of CPUs by counting
occurances of a hostname is useful in a batch environment where a
hostfile may list the same hostname multiple times, indicating that the
batch scheduler has allocated multiple CPUs for a single job (e.g., PBS
operates this way).
For the above-mentioned schema, the command "mpirun C foo" would start
five instances of the foo program; two on beowulf1, two on beowulf2,
and one on somewhere.else.
FILES
$LAMHOME/etc/bhost.def default boot schema file
SEE ALSO
LAM User's Guide, lamboot(1), lamhalt(1), mpirun(1), MPI_Comm_spawn(1),
MPI_Comm_spawn_multiple(1), recon(1), lamwipe(1)