NAME
XMPI - X Window MPI user interface
SYNTAX
xmpi [-h] [<boot_schema>]
DESCRIPTION
XMPI is a graphical user interface for running MPI programs, monitoring
MPI processes and messages, and viewing execution trace files. It
exploits the debugging capabilities of LAM, a parallel computing
environment for UNIX clusters. XMPI is constructed from the Motif
widget set.
XMPI does not provide an interface for starting a LAM session. This
must be accomplished prior to running XMPI, which is itself a LAM
program. The boot schema from which LAM was started can (should) be
provided to XMPI so that it may be presented as an inventory of nodes
on which programs may be run. If XMPI is to be used only to view trace
files then starting LAM is not required.
This description assumes a basic knowledge of MPI.
TYPICAL USAGE
XMPI provides a graphical display of the state of the processes within
an MPI application. The state information is obtained from one of two
sources, a running application started by XMPI or a file containing
trace data from a traced MPI application. When XMPI is started, its
top-level overview window is blank. Once an application is started or
a trace file is loaded the overview window fills with a tiled group of
hexagons, each representing the state of one MPI process and labeled by
the process rank within MPI_COMM_WORLD. A traffic light symbol
indicates whether the process is running or blocked. No traffic light
is shown for processes which have either finalized or not yet
initialized the MPI library.
When monitoring a running application the camera "Snap" button or
"Snapshot" item in the "Application" menu updates the state information
on all processes at any time. When viewing trace data the state
information is updated according to the currently selected time point
(see "XMPI TRACE FILES").
A mouse click inside a hexagon pops up an additional window containing
more detailed information about the process. If the process is
blocked, the function name, peer process rank, communicator, message
tag and element count are displayed. If unreceived messages are
available, their quantity, source process rank, communicator, message
tag and element count are displayed. By leaving a few process windows
on the screen, a user can focus debugging on a small and manageable
collection of misbehaving processes.
The "Clean" button or "Clean" item in the "Application" menu terminates
an application and the development cycle can be repeated. The previous
application can be rerun with the "Rerun" button or "Rerun" item in the
"Application" menu.
RUNNING AN APPLICATION
An application schema specifies an MPI application by listing each
process’s program name, program location, target processor(s) and
optional command line arguments.
The "Browse&Run" item in the "Application" menu pops up a simple file
browser for choosing and running a pre-written application schema.
Alternatively an application schema can be configured with the XMPI
application builder dialog, invoked by the "Build&Run" item in the
"Application" menu.
The builder dialog has an area to specify each process and an arrow
button to add it to the application schema, which is shown below the
arrow button in a scrolled list. The lines in the list show the syntax
that would be used in creating the same application with a text editor.
Indeed, the "Save" button saves the application schema in a file for
later use and/or editing.
A specified process does not become part of the application until the
arrow (commit) button is pressed. Once it appears in the application
scrolled list, a process can be deleted by selecting it and pressing
the <Delete> key.
Pressing the "Run" button with anything in the application list causes
that application to be run. The overview window is then initialized
with the status of the application.
Program Specification
A file browser in the middle of the builder dialog aids in selecting a
program file. The browser only navigates the file space of the node
running XMPI. If a program is located on another node outside the file
space (outside NFS, etc.) its pathname may need to be typed into the
process specification area. Selecting the "Use Full Pathname" toggle
button will cause programs to be placed into the application schema as
full pathnames.
XMPI limits the choice of a program source node to either the node
running XMPI or the process target node. The latter case is the
default and is the most efficient because LAM does not need to transfer
the program from source to target node. The "Transfer Program" toggle
button selects the source node policy.
Multiple Program Copies
The number of copies of a program to be run can be set in the process
specification area. Clicking on the increment or decrement arrow will
increment or decrement the count by one. Clicking with the shift key
down will increment or decrement by ten.
Command-line Arguments
Command-line arguments must be typed into the process specification
area.
Node Specification
A boot schema specifies the computers participating as nodes in a LAM
multicomputer. If XMPI is given a boot schema filename, its contents
will appear in a scrolled list on the right side of the builder dialog.
XMPI will search for the given schema in the local directory. The boot
schema filename is displayed above the list of its nodes. Multiple
target nodes can be selected from the scrolled list with the
corresponding node mnemonic appearing in the process specification
area. Selecting multiple target nodes specifies multiple processes
with the program name, arguments and source node policy held constant.
If no boot schema was specified only the special node selectors "LOCAL"
(meaning the node on which XMPI is running) and "ALL NODES" are
provided.
Target node descriptions may also be typed directly into the process
specification area. The local node is specified as h. The origin node
from which the machine was booted, if not local, can be specified as o.
All usable nodes are specified as N. Nodes are generically identified
as n<list>, where <list> can be a single node identifier or a list of
node identifiers. Identifiers can be written in decimal or hexadecimal
notation. Examples are n1 or n0-7,0x10.
Run-time Options
Applications can be run with various run-time options to specify the
behaviour of the MPI library. These can be configured from a separate
dialog which is activated from the "Runtime" item in the "Options"
menu. Options remain in effect until changed.
· tracing mode (default enabled)
· fast client-to-client communication (default disabled)
· GER protocol and error detection (default enabled)
· homogeneous LAM node optimization (default disabled)
FOCUSING ON A PROCESS
More information on a process’s state can be obtained by clicking the
left mouse button within the process hexagon. This will pop up a focus
window. The upper area of the focus window is the process area and
displays the current state of the process. The lower area is the
message area and displays information on the process’s message queue.
The focus window banner contains a tack button which can be clicked to
dismiss the window and a label containing the process’s identity along
with the program name. In XMPI processes are identified first by their
rank in MPI_COMM_WORLD and if the process is communicating, with a
slash followed by the process’s rank within the current communicator.
The focus window can also be dismissed by clicking once again in the
process hexagon.
The process area describes the current state of the process together
with the name of and (where appropriate) arguments to the MPI function
currently being executed. The layout is fairly self-explanatory and we
describe only the less obvious features.
Communicator Identification
The "comm" area shows the communicator being used in the current MPI
function. Communicators are opaque objects which MPI does not identify
in any meaningful, printable way. LAM’s MPI implementation adds a
simple numerical identifier to communicators, which is displayed in
XMPI as <x> where x is the identifier. This identifier can be matched
to communicator variables in an MPI program with the LAM function,
MPIL_Comm_id(2).
Group Membership
The button to the right of the "comm" area will highlight in the
overview window the hexagons of the processes in the communicator. For
an intracommunicator, the hexagons will be highlighted in the color
specified by the "lcomCol" resource. For an intercommunicator,
processes in the local group will be highlighted in the color specified
by the "lcomCol" resource and those in the remote group in the color
specified by the "rcomCol" resource. For highlighted processes the
process identification at the bottom of the hexagon is changed to be
the rank in MPI_COMM_WORLD followed by a slash and the rank in the
communicator being highlighted.
Datatype
The datatype button to the right of the "cnt" area will display in the
datatype window (see "DATATYPE WINDOW") the type map of the datatype
argument to the current MPI function.
The message area describes the current state of the queue of messages
destined to the process and not yet received. Once again the layout is
fairly self-explanatory and we describe only the less obvious features.
Message Aggregates
Identical undelivered messages are aggregated. The "copy" area shows
the number of messages within the visible aggregate, followed by the
total number of messages in the queue. The button to the right of the
"copy" area cycles through the message aggregates.
Source Rank
The "src" area shows the rank of the source process within
MPI_COMM_WORLD followed by the rank of the source process in the
communicator in which the message was sent.
Datatype
The datatype button to the right of the "cnt" area will display in the
datatype window the type map of the message’s datatype.
Group Membership
The button to the right of the "comm" area will highlight the message
communicator in the manner previously described.
XMPI TRACE FILES
XMPI can be used to view existing trace files and can be used to create
trace files for applications run under XMPI.
To load and view an existing trace file select the "View" item in the
"Trace" menu.
If an application is run under XMPI with tracing enabled (the default),
LAM will trace the application. Before the trace data can be viewed in
XMPI it must be dumped to a file. This is done by selecting the "Dump"
item from the "Trace" menu. You will be prompted for a file name. By
convention XMPI trace files have a ".lamtr" suffix. The trace file can
be viewed by loading it as described above. As a shortcut select the
"Express" item in the "Trace" menu, or equivalently click the "Trace"
button in the overview window. This dumps the trace data to a
temporary file and then immediately loads the file for viewing. If you
decide that you want to save trace data for later viewing then you must
dump it using the "Dump" item from the "Trace" menu. Dumping trace
data to file does not purge any trace data and a subsequent dump will
contain all the trace data from the start of the application up until
the time of dumping. Terminating an application via the "Clean" button
or menu item purges all trace data.
While viewing a trace an application previously launched by XMPI
continues to run in the background. Upon the closing of the trace
window XMPI will return to snapshot mode if there is a running
application.
When loading trace files containing multiple segments (see
MPIL_Trace_on(2) and MPIL_Trace_off(2)) you will be prompted for the
number of the segment you wish to view. If you wish later to view a
different segment, simply reload the trace file and specify the new
segment number when prompted. Reloading is done via the "View" or
"Express" items in the "Trace" menu.
Communication Timeline Window
Across the top of the timeline window is a control and information
area. The trace data is displayed below this on timelines, one per
process in the traced application. The state of the application at a
particular time is represented by the corresponding traffic light
color. Green represents running, red represents blocked waiting on
communication and yellow represents time spent inside an MPI function
not blocked on communication (we call this system overhead time as it
typically represents time doing data conversion, message packing, etc).
The dial can be used to select a time point at which the process states
are to be displayed. In the overview window the process states at the
dial time are displayed in hexagon form. As with snapshot mode more
detailed information on a process can be obtained by bringing up its
focus window. The dial may be moved by clicking with the left button
in the trace view area or via the VCR controls. Below the VCR controls
are displayed from left to right, the time of the left edge of the
displayed timeline, the current dial time and the time of the right
edge of the displayed timeline.
To the right of the VCR controls is displayed the current
magnification. When a trace file is loaded XMPI chooses an initial
scaling factor and sets this to be the 1x1 magnification. You can
increase and decrease the magnification using the zoom and un-zoom
buttons.
A segment of the currently displayed timeline can be selected by
dragging the right mouse button in the timeline display area. Upon
release of the right button the display is zoomed to show the selected
segment. To cancel a drag in progress, drag the cursor up or down out
of the timeline display area.
How Communication Is Represented
Collective
A collective communication is represented for each process by
contiguous line segments showing the time spent in system overhead
and the time spent blocked waiting for communication. No lines are
drawn connecting the processes participating in the collective
communication.
Blocking_point_to_point
For both the send and receive process contiguous line segments are
drawn showing the time spent in system overhead and the time spent
blocked waiting for the communication to complete. A line is drawn
connecting the send to the receive. It originates at the beginning
of the send segments and is drawn to the end of the matching
receive segments.
Non-blocking_point_to_point
At the time a non-blocking send or receive is initiated a system
overhead segment is drawn. When the communication is completed via
a wait or test, segments showing system overhead and blocking time
are drawn. Lines are drawn between matching sends and receives,
except in this case the line is drawn from the segment where the
send was initiated to where the corresponding receive completed.
Waits_and_tests
If a non-blocking communication is completed inside a wait/test
function XMPI will show the function name in the focus window as
the wait/test function followed in parentheses by the send/receive
function being completed. For example, if an MPI_Issend() is
completed inside an MPI_Wait(), the function will read MPI_Wait
(MPI_Issend).
Missing_traces
Owing to the use of trace segments or the dropping of overflow
traces (see lamtrace(1)) there may be send or receive traces which
have no match in the trace data. In these cases a short stub line
is drawn out from a send or in to a receive.
Kiviat Window
When viewing a trace file, the "Kiviat" button or "Kiviat" item from
the "Trace" menu brings up the Kiviat window. This window displays, in
a segmented pie-chart format, the cumulative time up to the current
dial time, spent by each process in the running, overhead and blocked
states.
MESSAGE SOURCE MATRIX
The message source window displays a square matrix of process message
queue lengths. For each process it shows the number of queued messages
from each other process in the application. It can be brought up while
monitoring a running application or while viewing a trace file, by
selecting the "Matrix" button or "Matrix" item in the "Trace" menu.
DATATYPE WINDOW
The datatype window displays a textual representation of the type map
of an MPI datatype. This window is associated at any instant with a
particular process and mode. The associated process is shown in the
window’s banner and the mode is indicated by a traffic light or message
queue icon shown in the left part of the window. When in process mode
the datatype being shown, if any, is the datatype argument of the MPI
function the process is executing. When in message mode the datatype
is that of the current message aggregate selected in the process focus
window. Switching between processes and modes is effected via the
datatype buttons in the process focus windows.
The type map might not fit completely into the default size window.
Simply resize the window to see the whole map.
SWITCHING INFORMATION SOURCES
XMPI will gather and display information from either the currently
executing application or a trace file. When an application is launched
from XMPI, the information source is the executing application and the
"Snap" button is active. Though the application may be producing trace
data, the "Snap" button does not use it, but instead acquires
information from debugging hooks in the MPI implementation. At any
moment, an existing trace file may be loaded into XMPI or the currently
accumulating trace data may be fetched from the MPI implementation,
stored in a file, and loaded. This action changes the information
source to the loaded trace file. Information display is now controlled
from the dial in the timeline window and not from the "Snap" button,
which is now inactive. Though the application may still be running,
the timeline dial does not use the runtime debugging hooks, but instead
acquires information from the loaded trace file. Upon the closing of
the trace window XMPI will return to snapshot mode if there is a
running application.
RESOURCES
XMPI defines the following application resources.
XMPI.helpCmd command that is run to provide help. The default
is typically a command which fires up a Web browser
to view a help page. You should change this to
invoke your favourite browser.
XMPI.rankFont process rank font in hexagon
XMPI.msgFont total message count font in hexagon (may need to be
adjusted to fit inside message icon)
XMPI.lcomCol color used to highlight the processes in an
intracommunicator or in the the local group of an
intercommunicator
XMPI.rcomCol color used to highlight the processes in the remote
group of an intercommunicator
XMPI.bandCol color used for the zoom selection rubber band
XMPI.bandDash if True use a dashed line rubber band to show the
zoom selection otherwise use a solid line
XMPI.bandWidth width of the zoom selection rubber band
XMPI gets important default resources from the application defaults
file, XMPI. If this file is not installed in the X11 default
directory, its directory can be added to the XAPPLRESDIR environment
variable.
LIMITATIONS
An application must be started by XMPI to be monitored by it.
When using the fast client-to-client communication mode process states
in snapshot mode are always shown as running and no useful information
is shown in the process focus windows.
XMPI uses lamclean(1). Errors reported by this tool will still print
to standard output. A shorter message will appear in an XMPI error
dialog.
SEE ALSO
mpimsg(1), mpirun(1), mpitask(1), lamtrace(1)
-RELEASEVERSION- -RELEASEDATE-