XMPI - X Window MPI user interface

NAME

       XMPI - X Window MPI user interface

SYNTAX

       xmpi [-h] [<boot_schema>]

DESCRIPTION

       XMPI is a graphical user interface for running MPI programs, monitoring
       MPI processes and messages, and  viewing  execution  trace  files.   It
       exploits  the  debugging  capabilities  of  LAM,  a  parallel computing
       environment for UNIX clusters.  XMPI  is  constructed  from  the  Motif
       widget set.

       XMPI  does  not  provide an interface for starting a LAM session.  This
       must be accomplished prior to running  XMPI,  which  is  itself  a  LAM
       program.   The  boot  schema from which LAM was started can (should) be
       provided to XMPI so that it may be presented as an inventory  of  nodes
       on which programs may be run.  If XMPI is to be used only to view trace
       files then starting LAM is not required.

       This description assumes a basic knowledge of MPI.

TYPICAL USAGE

       XMPI provides a graphical display of the state of the processes  within
       an  MPI application.  The state information is obtained from one of two
       sources, a running application started by XMPI  or  a  file  containing
       trace  data  from  a traced MPI application.  When XMPI is started, its
       top-level overview window is blank.  Once an application is started  or
       a  trace file is loaded the overview window fills with a tiled group of
       hexagons, each representing the state of one MPI process and labeled by
       the  process  rank  within  MPI_COMM_WORLD.   A  traffic  light  symbol
       indicates whether the process is running or blocked.  No traffic  light
       is  shown  for  processes  which  have  either  finalized  or  not  yet
       initialized the MPI library.

       When monitoring a running  application  the  camera  "Snap"  button  or
       "Snapshot" item in the "Application" menu updates the state information
       on all processes at any  time.   When  viewing  trace  data  the  state
       information  is  updated according to the currently selected time point
       (see "XMPI TRACE FILES").

       A mouse click inside a hexagon pops up an additional window  containing
       more  detailed  information  about  the  process.   If  the  process is
       blocked, the function name, peer process  rank,  communicator,  message
       tag  and  element  count  are  displayed.   If  unreceived messages are
       available, their quantity, source process rank,  communicator,  message
       tag  and element count are displayed.  By leaving a few process windows
       on the screen, a user can focus debugging on  a  small  and  manageable
       collection of misbehaving processes.

       The "Clean" button or "Clean" item in the "Application" menu terminates
       an application and the development cycle can be repeated.  The previous
       application can be rerun with the "Rerun" button or "Rerun" item in the
       "Application" menu.

RUNNING AN APPLICATION

       An application schema specifies an  MPI  application  by  listing  each
       process’s  program  name,  program  location,  target  processor(s) and
       optional command line arguments.

       The "Browse&Run" item in the "Application" menu pops up a  simple  file
       browser  for  choosing  and  running  a pre-written application schema.
       Alternatively an application schema can be  configured  with  the  XMPI
       application  builder  dialog,  invoked  by  the "Build&Run" item in the
       "Application" menu.

       The builder dialog has an area to specify each  process  and  an  arrow
       button  to  add  it to the application schema, which is shown below the
       arrow button in a scrolled list.  The lines in the list show the syntax
       that would be used in creating the same application with a text editor.
       Indeed, the "Save" button saves the application schema in  a  file  for
       later use and/or editing.

       A  specified  process does not become part of the application until the
       arrow (commit) button is pressed.  Once it appears in  the  application
       scrolled  list,  a  process can be deleted by selecting it and pressing
       the <Delete> key.

       Pressing the "Run" button with anything in the application list  causes
       that  application  to  be run.  The overview window is then initialized
       with the status of the application.

   Program Specification
       A file browser in the middle of the builder dialog aids in selecting  a
       program  file.   The  browser only navigates the file space of the node
       running XMPI.  If a program is located on another node outside the file
       space  (outside  NFS,  etc.) its pathname may need to be typed into the
       process specification area.  Selecting the "Use Full  Pathname"  toggle
       button  will cause programs to be placed into the application schema as
       full pathnames.

       XMPI limits the choice of a program source  node  to  either  the  node
       running  XMPI  or  the  process  target  node.   The latter case is the
       default and is the most efficient because LAM does not need to transfer
       the  program from source to target node.  The "Transfer Program" toggle
       button selects the source node policy.

   Multiple Program Copies
       The number of copies of a program to be run can be set in  the  process
       specification  area.  Clicking on the increment or decrement arrow will
       increment or decrement the count by one. Clicking with  the  shift  key
       down will increment or decrement by ten.

   Command-line Arguments
       Command-line  arguments  must  be  typed into the process specification
       area.

   Node Specification
       A boot schema specifies the computers participating as nodes in  a  LAM
       multicomputer.   If  XMPI is given a boot schema filename, its contents
       will appear in a scrolled list on the right side of the builder dialog.
       XMPI will search for the given schema in the local directory.  The boot
       schema filename is displayed above the list  of  its  nodes.   Multiple
       target   nodes  can  be  selected  from  the  scrolled  list  with  the
       corresponding node mnemonic  appearing  in  the  process  specification
       area.   Selecting  multiple  target  nodes specifies multiple processes
       with the program name, arguments and source node policy held  constant.

       If no boot schema was specified only the special node selectors "LOCAL"
       (meaning the node on  which  XMPI  is  running)  and  "ALL  NODES"  are
       provided.

       Target  node  descriptions  may also be typed directly into the process
       specification area.  The local node is specified as h.  The origin node
       from which the machine was booted, if not local, can be specified as o.
       All usable nodes are specified as N.  Nodes are generically  identified
       as  n<list>,  where <list> can be a single node identifier or a list of
       node identifiers.  Identifiers can be written in decimal or hexadecimal
       notation.  Examples are n1 or n0-7,0x10.

   Run-time Options
       Applications  can  be  run with various run-time options to specify the
       behaviour of the MPI library.  These can be configured from a  separate
       dialog  which  is  activated  from  the "Runtime" item in the "Options"
       menu.  Options remain in effect until changed.

       ·      tracing mode (default enabled)

       ·      fast client-to-client communication (default disabled)

       ·      GER protocol and error detection (default enabled)

       ·      homogeneous LAM node optimization (default disabled)

FOCUSING ON A PROCESS

       More information on a process’s state can be obtained by  clicking  the
       left mouse button within the process hexagon.  This will pop up a focus
       window.  The upper area of the focus window is  the  process  area  and
       displays  the  current  state  of  the  process.  The lower area is the
       message area and displays information on the process’s message queue.

       The focus window banner contains a tack button which can be clicked  to
       dismiss  the window and a label containing the process’s identity along
       with the program name.  In XMPI processes are identified first by their
       rank  in  MPI_COMM_WORLD  and  if  the process is communicating, with a
       slash followed by the process’s rank within the  current  communicator.
       The  focus  window  can also be dismissed by clicking once again in the
       process hexagon.

       The process area describes the current state of  the  process  together
       with  the name of and (where appropriate) arguments to the MPI function
       currently being executed.  The layout is fairly self-explanatory and we
       describe only the less obvious features.

   Communicator Identification
       The  "comm"  area  shows the communicator being used in the current MPI
       function.  Communicators are opaque objects which MPI does not identify
       in  any  meaningful,  printable  way.   LAM’s MPI implementation adds a
       simple numerical identifier to communicators,  which  is  displayed  in
       XMPI  as <x> where x is the identifier.  This identifier can be matched
       to communicator variables in an MPI  program  with  the  LAM  function,
       MPIL_Comm_id(2).

   Group Membership
       The  button  to  the  right  of  the  "comm" area will highlight in the
       overview window the hexagons of the processes in the communicator.  For
       an  intracommunicator,  the  hexagons  will be highlighted in the color
       specified  by  the  "lcomCol"  resource.   For  an   intercommunicator,
       processes in the local group will be highlighted in the color specified
       by the "lcomCol" resource and those in the remote group  in  the  color
       specified  by  the  "rcomCol"  resource.  For highlighted processes the
       process identification at the bottom of the hexagon is  changed  to  be
       the  rank  in  MPI_COMM_WORLD  followed  by a slash and the rank in the
       communicator being highlighted.

   Datatype
       The datatype button to the right of the "cnt" area will display in  the
       datatype  window  (see  "DATATYPE WINDOW") the type map of the datatype
       argument to the current MPI function.

       The message area describes the current state of the queue  of  messages
       destined to the process and not yet received.  Once again the layout is
       fairly self-explanatory and we describe only the less obvious features.

   Message Aggregates
       Identical  undelivered  messages  are aggregated. The "copy" area shows
       the number of messages within the visible aggregate,  followed  by  the
       total  number of messages in the queue.  The button to the right of the
       "copy" area cycles through the message aggregates.

   Source Rank
       The  "src"  area  shows  the  rank  of  the   source   process   within
       MPI_COMM_WORLD  followed  by  the  rank  of  the  source process in the
       communicator in which the message was sent.

   Datatype
       The datatype button to the right of the "cnt" area will display in  the
       datatype window the type map of the message’s datatype.

   Group Membership
       The  button  to the right of the "comm" area will highlight the message
       communicator in the manner previously described.

XMPI TRACE FILES

       XMPI can be used to view existing trace files and can be used to create
       trace files for applications run under XMPI.

       To  load  and view an existing trace file select the "View" item in the
       "Trace" menu.

       If an application is run under XMPI with tracing enabled (the default),
       LAM will trace the application.  Before the trace data can be viewed in
       XMPI it must be dumped to a file.  This is done by selecting the "Dump"
       item  from the "Trace" menu.  You will be prompted for a file name.  By
       convention XMPI trace files have a ".lamtr" suffix.  The trace file can
       be  viewed  by loading it as described above.  As a shortcut select the
       "Express" item in the "Trace" menu, or equivalently click  the  "Trace"
       button  in  the  overview  window.   This  dumps  the  trace  data to a
       temporary file and then immediately loads the file for viewing.  If you
       decide that you want to save trace data for later viewing then you must
       dump it using the "Dump" item from the  "Trace"  menu.   Dumping  trace
       data  to  file does not purge any trace data and a subsequent dump will
       contain all the trace data from the start of the application  up  until
       the time of dumping.  Terminating an application via the "Clean" button
       or menu item purges all trace data.

       While viewing a  trace  an  application  previously  launched  by  XMPI
       continues  to  run  in  the  background.  Upon the closing of the trace
       window XMPI will  return  to  snapshot  mode  if  there  is  a  running
       application.

       When   loading   trace   files   containing   multiple   segments  (see
       MPIL_Trace_on(2) and MPIL_Trace_off(2)) you will be  prompted  for  the
       number  of  the  segment you wish to view.  If you wish later to view a
       different segment, simply reload the trace file  and  specify  the  new
       segment  number  when  prompted.   Reloading  is done via the "View" or
       "Express" items in the "Trace" menu.

   Communication Timeline Window
       Across the top of the timeline window  is  a  control  and  information
       area.   The  trace  data  is displayed below this on timelines, one per
       process in the traced application.  The state of the application  at  a
       particular  time  is  represented  by  the  corresponding traffic light
       color.  Green represents running, red  represents  blocked  waiting  on
       communication  and  yellow represents time spent inside an MPI function
       not blocked on communication (we call this system overhead time  as  it
       typically represents time doing data conversion, message packing, etc).

       The dial can be used to select a time point at which the process states
       are  to be displayed.  In the overview window the process states at the
       dial time are displayed in hexagon form.  As with  snapshot  mode  more
       detailed  information  on  a process can be obtained by bringing up its
       focus window.  The dial may be moved by clicking with the  left  button
       in the trace view area or via the VCR controls.  Below the VCR controls
       are displayed from left to right, the time of  the  left  edge  of  the
       displayed  timeline,  the  current  dial time and the time of the right
       edge of the displayed timeline.

       To  the  right  of  the  VCR  controls   is   displayed   the   current
       magnification.   When  a  trace  file is loaded XMPI chooses an initial
       scaling factor and sets this to be  the  1x1  magnification.   You  can
       increase  and  decrease  the  magnification  using the zoom and un-zoom
       buttons.

       A segment of the  currently  displayed  timeline  can  be  selected  by
       dragging  the  right  mouse  button in the timeline display area.  Upon
       release of the right button the display is zoomed to show the  selected
       segment.   To cancel a drag in progress, drag the cursor up or down out
       of the timeline display area.

   How Communication Is Represented
       Collective
           A collective communication  is  represented  for  each  process  by
           contiguous  line segments showing the time spent in system overhead
           and the time spent blocked waiting for communication.  No lines are
           drawn  connecting  the  processes  participating  in the collective
           communication.

       Blocking_point_to_point
           For both the send and receive process contiguous line segments  are
           drawn  showing the time spent in system overhead and the time spent
           blocked waiting for the communication to complete.  A line is drawn
           connecting the send to the receive.  It originates at the beginning
           of the send segments and is  drawn  to  the  end  of  the  matching
           receive segments.

       Non-blocking_point_to_point
           At  the  time  a non-blocking send or receive is initiated a system
           overhead segment is drawn.  When the communication is completed via
           a  wait or test, segments showing system overhead and blocking time
           are drawn.  Lines are drawn between matching  sends  and  receives,
           except  in  this  case the line is drawn from the segment where the
           send was initiated to where the corresponding receive completed.

       Waits_and_tests
           If a non-blocking communication is  completed  inside  a  wait/test
           function  XMPI  will  show the function name in the focus window as
           the wait/test function followed in parentheses by the  send/receive
           function  being  completed.   For  example,  if  an MPI_Issend() is
           completed inside an MPI_Wait(), the  function  will  read  MPI_Wait
           (MPI_Issend).

       Missing_traces
           Owing  to  the  use  of  trace segments or the dropping of overflow
           traces (see lamtrace(1)) there may be send or receive traces  which
           have  no match in the trace data.  In these cases a short stub line
           is drawn out from a send or in to a receive.

   Kiviat Window
       When viewing a trace file,  the "Kiviat" button or "Kiviat"  item  from
       the "Trace" menu brings up the Kiviat window.  This window displays, in
       a segmented pie-chart format, the cumulative time  up  to  the  current
       dial  time,  spent by each process in the running, overhead and blocked
       states.

MESSAGE SOURCE MATRIX

       The message source window displays a square matrix of  process  message
       queue lengths.  For each process it shows the number of queued messages
       from each other process in the application.  It can be brought up while
       monitoring  a  running  application  or  while viewing a trace file, by
       selecting the "Matrix" button or "Matrix" item in the "Trace" menu.

DATATYPE WINDOW

       The datatype window displays a textual representation of the  type  map
       of  an  MPI  datatype.  This window is associated at any instant with a
       particular process and mode.  The associated process is  shown  in  the
       window’s banner and the mode is indicated by a traffic light or message
       queue icon shown in the left part of the window.  When in process  mode
       the  datatype  being shown, if any, is the datatype argument of the MPI
       function the process is executing.  When in message mode  the  datatype
       is  that of the current message aggregate selected in the process focus
       window.  Switching between processes and  modes  is  effected  via  the
       datatype buttons in the process focus windows.

       The  type  map  might  not fit completely into the default size window.
       Simply resize the window to see the whole map.

SWITCHING INFORMATION SOURCES

       XMPI will gather and display  information  from  either  the  currently
       executing application or a trace file.  When an application is launched
       from XMPI, the information source is the executing application and  the
       "Snap" button is active.  Though the application may be producing trace
       data,  the  "Snap"  button  does  not  use  it,  but  instead  acquires
       information  from  debugging  hooks  in the MPI implementation.  At any
       moment, an existing trace file may be loaded into XMPI or the currently
       accumulating  trace  data  may  be fetched from the MPI implementation,
       stored in a file, and loaded.   This  action  changes  the  information
       source to the loaded trace file.  Information display is now controlled
       from the dial in the timeline window and not from  the  "Snap"  button,
       which  is  now  inactive.  Though the application may still be running,
       the timeline dial does not use the runtime debugging hooks, but instead
       acquires  information  from the loaded trace file.  Upon the closing of
       the trace window XMPI will return  to  snapshot  mode  if  there  is  a
       running application.

RESOURCES

       XMPI defines the following application resources.

       XMPI.helpCmd        command  that  is run to provide help.  The default
                           is typically a command which fires up a Web browser
                           to  view  a  help  page.  You should change this to
                           invoke your favourite browser.

       XMPI.rankFont       process rank font in hexagon

       XMPI.msgFont        total message count font in hexagon (may need to be
                           adjusted to fit inside message icon)

       XMPI.lcomCol        color   used  to  highlight  the  processes  in  an
                           intracommunicator or in the the local group  of  an
                           intercommunicator

       XMPI.rcomCol        color used to highlight the processes in the remote
                           group of an intercommunicator

       XMPI.bandCol        color used for the zoom selection rubber band

       XMPI.bandDash       if True use a dashed line rubber band to  show  the
                           zoom selection otherwise use a solid line

       XMPI.bandWidth      width of the zoom selection rubber band

       XMPI  gets  important  default  resources from the application defaults
       file, XMPI.   If  this  file  is  not  installed  in  the  X11  default
       directory,  its  directory  can be added to the XAPPLRESDIR environment
       variable.

LIMITATIONS

       An application must be started by XMPI to be monitored by it.

       When using the fast client-to-client communication mode process  states
       in  snapshot mode are always shown as running and no useful information
       is shown in the process focus windows.

       XMPI uses lamclean(1).  Errors reported by this tool will  still  print
       to  standard  output.   A  shorter message will appear in an XMPI error
       dialog.

NAME

SYNTAX

DESCRIPTION

TYPICAL USAGE

RUNNING AN APPLICATION

FOCUSING ON A PROCESS

XMPI TRACE FILES

MESSAGE SOURCE MATRIX

DATATYPE WINDOW

SWITCHING INFORMATION SOURCES

RESOURCES

LIMITATIONS

SEE ALSO