Man Linux: Main Page and Category List

NAME

       sam_overview - Overview of the Simple Availability Manager

OVERVIEW

       The  SAM  library provide a tool to check the health of an application.
       The main purpose of SAM is to restart a local process when it fails  to
       respond to a healthcheck request in a configured time interval.

       During  sam_initialize(3),  a  duplicate copy of the process is created
       using the fork(3) system call.  This duplicate  process  copy  contains
       the  logic for executing the SAM server.  The SAM server is responsible
       for requesting healthchecks from the active  process,  and  controlling
       the  lifecycle  of  the  active  process  when it fails.  If the active
       process fails to respond to the healthcheck request  sent  by  the  SAM
       server,  it  will  be  sent a SIGTERM signal to request shutdown of the
       application.  After a configured time interval,  the  process  will  be
       forcibly  killed  by  being  sent  a  SIGKILL  signal.  Once the active
       process terminates, the SAM server will create a new active process.

       The Simple Availability Manager is meant to be used in conjunction with
       the  cpg  service.   Used  together,  it  is  possible to restart a cpg
       process that fails healthchecking during operation.

       The main features of SAM include:

              ·  A configurable recovery policy.

              ·  A configurable time interval for health check operations.

              ·  A notification via signal before recovery action is taken.

              ·  A mechanism to indicate to  the  application  the  number  of
                 times an active process has been created by the SAM server.

              ·  Both  application  driven  health  checking  and event driven
                 health checking.

Initializing SAM

       The SAM library is initialized by sam_initialize(3).   sam_initalize(3)
       may  only  be  called  once per process.  Calling it more then once has
       undefined results and is not recommended or tested.

Setting warning callback

       A SIGTERM signal is sent to the application when a recovery  action  is
       planned.   The application can use the signal(3) system call to monitor
       for this signal.

       There are no special constraints on what SAM apis may be  called  in  a
       warning  callback.   After  time_interval  expires, a SIGKILL signal is
       sent to the active process to force its termination.

Registering the active process

       The active process is registered with SAM by  calling  sam_register(3).
       This  function  should  only  be called one time in a process.  After a
       recovery action is taken, the new active process will  begin  execution
       at the next line of code in a user process after sam_register(3).

Enabling event driven healthchecking

       Two types of healthchecking are available to the user.  The first model
       is one where  the  user  application  healthchecks  during  its  normal
       operation.   It  is  never  requested  to healtcheck, and if the active
       process doesn’t respond within the time interval, the process  will  be
       restarted.

       A   more   useful   mechanism   for   healthchecking  is  event  driven
       healthchecking.  Because this model is directed by the SAM  server,  It
       isn’t  necessary to guess or add timers to the active process to signal
       a  healthcheck  operation  is  successful.    To   use   event   driven
       healthchecking,  the  sam_hc_callback_register(3)  function  should  be
       executed.

BUGS

SEE ALSO

       sam_initialize(3),    sam_finalize(3),    sam_start(3),    sam_stop(3),
       sam_register(3), sam_hc_send(3), sam_hc_callback_register(3)