corosync.conf - corosync executive configuration file

NAME

       corosync.conf - corosync executive configuration file

SYNOPSIS

       /etc/corosync.conf

DESCRIPTION

       The  corosync.conf  instructs  the  corosync  executive  about  various
       parameters needed to control the corosync executive.  Empty  lines  and
       lines  starting  with  # character are ignored.  The configuration file
       consists of bracketed top level  directives.   The  possible  directive
       choices are:

       totem { }
              This  top level directive contains configuration options for the
              totem protocol.

       logging { }
              This top level  directive  contains  configuration  options  for
              logging.

       event { }
              This  top level directive contains configuration options for the
              event service.

       It is also possible to specify the top level  parameter  compatibility.
       This  directive  indicates  the level of compatibility requested by the
       user.  The  option  whitetank  can  be  specified  to  remain  backward
       compatable  with  openais-0.80.z.   The option none can be specified to
       only  be  compatable  with  corosync-1.Y.Z.   Extra  processing  during
       configuration changes is required to remain backward compatable.

       The default is whitetank. (backwards compatibility)

       Within  the totem directive, an interface directive is required.  There
       is also one configuration option which is required:

       Within the interface sub-directive of totem there are  four  parameters
       which are required:

       ringnumber
              This  specifies  the  ring number for the interface.  When using
              the redundant  ring  protocol,  each  interface  should  specify
              separate  ring  numbers  to  uniquely identify to the membership
              protocol which interface to use for which  redundant  ring.  The
              ringnumber must start at 0.

       bindnetaddr
              This  specifies  the address which the corosync executive should
              bind.  This address should always end in  zero.   If  the  totem
              traffic  should  be routed over 192.168.5.92, set bindnetaddr to
              192.168.5.0.

              This may also be an IPV6 address, in which case IPV6  networking
              will  be used.  In this case, the full address must be specified
              and there is no automatic selection  of  the  network  interface
              within a specific subnet as with IPv4.

              If  IPv6 networking is used, the nodeid field must be specified.

       broadcast
              This is optional and can be set to yes.  If it is  set  to  yes,
              the  broadcast  address will be used for communication.  If this
              option is set, mcastaddr should not be set.

       mcastaddr
              This is the multicast address used by corosync  executive.   The
              default   should   work  for  most  networks,  but  the  network
              administrator should be queried about  a  multicast  address  to
              use.   Avoid  224.x.x.x  because  this  is  a "config" multicast
              address.

              This may also be an IPV6 multicast address, in which  case  IPV6
              networking will be used.  If IPv6 networking is used, the nodeid
              field must be specified.

       mcastport
              This specifies the UDP port number.  It is possible to  use  the
              same  multicast  address on a network with the corosync services
              configured for different UDP ports.

       Within the totem directive, there are seven  configuration  options  of
       which one is required, five are optional, and one is required when IPV6
       is configured in the interface subdirective.   The  required  directive
       controls  the  version of the totem configuration.  The optional option
       unless using IPV6 directive controls identification of  the  processor.
       The  optional options control secrecy and authentication, the redundant
       ring mode of operation, maximum network  MTU,  and  number  of  sending
       threads, and the nodeid field.

       version
              This specifies the version of the configuration file.  Currently
              the only valid version for this directive is 2.

       nodeid This configuration  option  is  optional  when  using  IPv4  and
              required when using IPv6.  This is a 32 bit value specifying the
              node identifier delivered to the cluster membership service.  If
              this  is not specified with IPv4, the node id will be determined
              from the 32 bit IP address the system to  which  the  system  is
              bound  with  ring identifier of 0.  The node identifier value of
              zero is reserved and should not be used.

       clear_node_high_bit
              This configuration option is optional and is only relevant  when
              no  nodeid  is specified.  Some openais clients require a signed
              32 bit nodeid that is  greater  than  zero  however  by  default
              openais  uses  all  32  bits  of  the  IPv4  address  space when
              generating a nodeid.  Set this option to yes to force  the  high
              bit  to  be  zero  and  therefor ensure the nodeid is a positive
              signed 32 bit integer.

              WARNING: The clusters behavior is undefined if  this  option  is
              enabled  on  only  a subset of the cluster (for example during a
              rolling upgrade).

       secauth
              This specifies that HMAC/SHA1 authentication should be  used  to
              authenticate  all  messages.  It further specifies that all data
              should be encrypted with the sober128  encryption  algorithm  to
              protect data from eavesdropping.

              Enabling this option adds a 36 byte header to every message sent
              by  totem  which  reduces  total  throughput.   Encryption   and
              authentication  consume 75% of CPU cycles in aisexec as measured
              with gprof when enabled.

              For 100mbit  networks  with  1500  MTU  frame  transmissions:  A
              throughput of 9mb/sec is possible with 100% cpu utilization when
              this option is enabled on 3ghz cpus.  A throughput  of  10mb/sec
              is  possible wth 20% cpu utilization when this optin is disabled
              on 3ghz cpus.

              For gig-e networks with large frame transmissions: A  throughput
              of  20mb/sec  is  possible  when  this option is enabled on 3ghz
              cpus.  A throughput of 60mb/sec is possible when this option  is
              disabled on 3ghz cpus.

              The default is on.

       rrp_mode
              This  specifies  the  mode of redundant ring, which may be none,
              active, or passive.  Active replication  offers  slightly  lower
              latency from transmit to delivery in faulty network environments
              but with  less  performance.   Passive  replication  may  nearly
              double  the  speed of the totem protocol if the protocol doesn’t
              become cpu bound.  The final option is none, in which case  only
              one  network  interface  will  be  used  to  operate  the  totem
              protocol.

              If  only  one  interface  directive  is   specified,   none   is
              automatically  chosen.   If  multiple  interface  directives are
              specified, only active or passive may be chosen.

       netmtu This specifies the network maximum transmit unit.  To  set  this
              value  beyond  1500,  the  regular  frame MTU, requires ethernet
              devices that support large, or also called  jumbo,  frames.   If
              any  device  in  the  network  doesn’t support large frames, the
              protocol will not operate properly.  The hosts  must  also  have
              their mtu size set from 1500 to whatever frame size is specified
              here.

              Please note while  some  NICs  or  switches  claim  large  frame
              support,  they  support  9000  MTU  as  the  maximum  frame size
              including the IP header.  Setting the netmtu and  host  MTUs  to
              9000  will  cause totem to use the full 9000 bytes of the frame.
              Then Linux will add a 18 byte header moving the full frame  size
              to  9018.   As  a result some hardware will not operate properly
              with this size of data.  A netmtu of 8982 seems to work for  the
              few   large   frame   devices   that  have  been  tested.   Some
              manufacturers claim  large  frame  support  when  in  fact  they
              support frame sizes of 4500 bytes.

              Increasing   the  MTU  from  1500  to  8982  doubles  throughput
              performance from 30MB/sec to 60MB/sec as measured with  evsbench
              with 175000 byte messages with the secauth directive set to off.

              When  sending  multicast  traffic,  if  the  network  frequently
              reconfigures,  chances  are  that  some  device  in  the network
              doesn’t support large frames.

              Choose hardware  carefully  if  intending  to  use  large  frame
              support.

              The default is 1500.

       threads
              This directive controls how many threads are used to encrypt and
              send multicast messages.  If secauth is off, the  protocol  will
              never  use  threaded  sending.  If secauth is on, this directive
              allows systems to be  configured  to  use  multiple  threads  to
              encrypt and send multicast messages.

              A  thread  directive of 0 indicates that no threaded send should
              be used.  This mode offers best performance for non-SMP systems.

              The default is 0.

       vsftype
              This  directive  controls the virtual synchrony filter type used
              to identify a primary component.  The preferred  choice  is  YKD
              dynamic  linear  voting,  however,  for  clusters larger then 32
              nodes YKD consumes alot of memory.   For  large  scale  clusters
              that are created by changing the MAX_PROCESSORS_COUNT #define in
              the C code totem.h file, the virtual synchrony filter "none"  is
              recommended  but then AMF and DLCK services (which are currently
              experimental) are not safe for use.

              The default is ykd.  The vsftype can also be set to none.

       transport
              This directive controls the transport mechanism  used.   If  the
              interface  to  which  corosync is binding is Infiniband, you can
              specify the "iba" option.  Any other option  is  ignored.   Note
              Infiniband  interfaces  will  use  RDMA transport techniques and
              perform  at  higher  bandwidths  and  lower  latency  than  gige
              networks.

              The  default is udp.  The transport type can also be set to iba.

              Within the totem  directive,  there  are  several  configuration
              options which are used to control the operation of the protocol.
              It is generally not recommended to change any  of  these  values
              without  proper  guidance and sufficient testing.  Some networks
              may  require  larger   values   if   suffering   from   frequent
              reconfigurations.   Some applications may require faster failure
              detection times which can be  achieved  by  reducing  the  token
              timeout.

       token  This  timeout  specifies  in  milliseconds until a token loss is
              declared after not receiving a token.  This is  the  time  spent
              detecting a failure of a processor in the current configuration.
              Reforming a new configuration takes  about  50  milliseconds  in
              addition to this timeout.

              The default is 1000 milliseconds.

       token_retransmit
              This  timeout  specifies  in  milliseconds after how long before
              receiving a token the token  is  retransmitted.   This  will  be
              automatically  calculated  if  token  is  modified.   It  is not
              recommended to  alter  this  value  without  guidance  from  the
              corosync community.

              The default is 238 milliseconds.

       hold   This timeout specifies in milliseconds how long the token should
              be held by the representative when the  protocol  is  under  low
              utilization.   It is not recommended to alter this value without
              guidance from the corosync community.

              The default is 180 milliseconds.

       token_retransmits_before_loss_const
              This value identifies  how  many  token  retransmits  should  be
              attempted  before forming a new configuration.  If this value is
              set, retransmit and hold will be automatically  calculated  from
              retransmits_before_loss and token.

              The default is 4 retransmissions.

       join   This timeout specifies in milliseconds how long to wait for join
              messages in the membership protocol.

              The default is 50 milliseconds.

       send_join
              This timeout specifies in milliseconds an upper range between  0
              and  send_join  to  wait  before  sending  a  join message.  For
              configurations with less then 32 nodes, this  parameter  is  not
              necessary.   For  larger  rings,  this parameter is necessary to
              ensure the NIC is not overflowed with join messages on formation
              of  a  new ring.  A reasonable value for large rings (128 nodes)
              would be 80msec.  Other timer values must also  change  if  this
              value is changed.  Seek advice from the corosync mailing list if
              trying to run larger configurations.

              The default is 0 milliseconds.

       consensus
              This timeout specifies in milliseconds  how  long  to  wait  for
              consensus  to  be  achieved  before  starting  a  new  round  of
              membership configuration.  The minimum value for consensus  must
              be  1.2 * token.  This value will be automatically calculated at
              1.2 * token if the user doesn’t specify a consensus value.

              The default is 1200 milliseconds.

       merge  This timeout specifies in milliseconds how long to  wait  before
              checking  for  a  partition  when  no multicast traffic is being
              sent.  If multicast traffic is being sent, the  merge  detection
              happens automatically as a function of the protocol.

              The default is 200 milliseconds.

       downcheck
              This  timeout  specifies in milliseconds how long to wait before
              checking that a network interface is back up after it  has  been
              downed.

              The default is 1000 millseconds.

       fail_to_recv_const
              This  constant specifies how many rotations of the token without
              receiving any of the messages when messages should  be  received
              may occur before a new configuration is formed.

              The default is 50 failures to receive a message.

       seqno_unchanged_const
              This  constant specifies how many rotations of the token without
              any multicast traffic should occur before  the  merge  detection
              timeout is started.

              The default is 30 rotations.

       heartbeat_failures_allowed
              [HeartBeating  mechanism]  Configures  the optional HeartBeating
              mechanism for  faster  failure  detection.  Keep  in  mind  that
              engaging  this  mechanism  in  lossy networks could cause faulty
              loss declaration as the mechanism  relies  on  the  network  for
              heartbeating.

              So as a rule of thumb use this mechanism if you require improved
              failure in low to medium utilized networks.

              This constant specifies the number  of  heartbeat  failures  the
              system should tolerate before declaring heartbeat failure e.g 3.
              Also if this value is  not  set  or  is  0  then  the  heartbeat
              mechanism is not engaged in the system and token rotation is the
              method of failure detection

              The default is 0 (disabled).

       max_network_delay
              [HeartBeating mechanism] This constant specifies in milliseconds
              the  approximate  delay that your network takes to transport one
              packet from one machine to another. This value is to be  set  by
              system  engineers  and  please  dont  change if not sure as this
              effects the failure detection mechanism using heartbeat.

              The default is 50 milliseconds.

       window_size
              This constant specifies the maximum number of messages that  may
              be  sent  on  one  token  rotation.   If  all processors perform
              equally well, this value  could  be  large  (300),  which  would
              introduce  higher  latency from origination to delivery for very
              large  rings.   To  reduce  latency  in  large  rings(16+),  the
              defaults  are a safe compromise.  If 1 or more slow processor(s)
              are present among fast  processors,  window_size  should  be  no
              larger  then  256000  /  netmtu  to avoid overflow of the kernel
              receive buffers.  The user is notified of this by the display of
              a retransmit list in the notification logs.  There is no loss of
              data, but performance is reduced when these errors occur.

              The default is 50 messages.

       max_messages
              This constant specifies the maximum number of messages that  may
              be  sent  by  one  processor  on  receipt  of  the  token.   The
              max_messages parameter is limited to 256000 / netmtu to  prevent
              overflow of the kernel transmit buffers.

              The default is 17 messages.

       rrp_problem_count_timeout
              This   specifies   the  time  in  milliseconds  to  wait  before
              decrementing the problem count by 1 for  a  particular  ring  to
              ensure  a  link  is  not  marked  faulty  for  transient network
              failures.

              The default is 2000 milliseconds.

       rrp_problem_count_threshold
              This specifies the number of times a problem is detected with  a
              link before setting the link faulty.  Once a link is set faulty,
              no more data is transmitted upon it.  Also, the problem  counter
              is no longer decremented when the problem count timeout expires.

              A problem is detected whenever all tokens  from  the  proceeding
              processor     have     not     been    received    within    the
              rrp_token_expired_timeout.   The  rrp_problem_count_threshold  *
              rrp_token_expired_timeout should be atleast 50 milliseconds less
              then the token timeout, or a complete reconfiguration may occur.

              The default is 10 problem counts.

       rrp_token_expired_timeout
              This specifies the time in milliseconds to increment the problem
              counter  for  the  redundant  ring  protocol  after  not  having
              received a token from all rings for a particular processor.

              This  value  will  automatically  be  calculated  from the token
              timeout and problem_count_threshold but may be  overridden.   It
              is  not recommended to override this value without guidance from
              the corosync community.

              The default is 47 milliseconds.

       Within the logging directive, there are several  configuration  options
       which are all optional.

       The  following  3  options  are  valid  only  for the top level logging
       directive:

       timestamp
              This specifies that a timestamp is placed on all log messages.

              The default is off.

       fileline
              This specifies that file and line should be printed.

              The default is off.

       function_name
              This specifies that the code function name should be printed.

              The default is off.

       The following options are valid both for top  level  logging  directive
       and they can be overriden in logger_subsys entries.

       to_stderr

       to_logfile

       to_syslog
              These specify the destination of logging output. Any combination
              of these options may be specified. Valid options are yes and no.

              The default is syslog and stderr.

              Please  note, if you are using to_logfile and want to rotate the
              file, use logrotate(8) with the option copytruncate.  eg.

              /var/log/corosync.log {
                  missingok
                  compress
                  notifempty
                  daily
                  rotate 7
                  copytruncate
              }

       logfile
              If the  to_logfile  directive  is  set  to  yes  ,  this  option
              specifies the pathname of the log file.

              No default.

       logfile_priority
              This   specifies   the  logfile  priority  for  this  particular
              subsystem. Ignored if debug is on.  Possible values are:  alert,
              crit,  debug  (same  as  debug  = on), emerg, err, info, notice,
              warning.

              The default is: info.

       syslog_facility
              This specifies the syslog facility type that will  be  used  for
              any messages sent to syslog. options are daemon, local0, local1,
              local2, local3, local4, local5, local6 & local7.

              The default is daemon.

       syslog_priority
              This specifies the syslog level for this  particular  subsystem.
              Ignored if debug is on.  Possible values are: alert, crit, debug
              (same as debug = on), emerg, err, info, notice, warning.

              The default is: info.

       debug  This  specifies  whether  debug  output  is  logged   for   this
              particular logger.

              The default is off.

       tags   This  specifies  which tags should be traced for this particular
              logger.  Set debug directive to on in order  to  enable  tracing
              using  tags.   Values  are  specified  using a vertical bar as a
              logical OR separator:

              enter|leave|trace1|trace2|trace3|...

              The default is none.

       Within the logging directive, logger_subsys directives are optional.

       Within the  logger_subsys  sub-directive,  all  of  the  above  logging
       configuration options are valid and can be used to override the default
       settings.  The subsys entry, described below, is mandatory to  identify
       the subsystem.

       subsys This  specifies  the subsystem identity (name) for which logging
              is specified. This is the name used by a service in the log_init
              () call. E.g. ’CKPT’. This directive is required.

FILES

       /etc/corosync.conf
              The corosync executive configuration file.

NAME

SYNOPSIS

DESCRIPTION

FILES

SEE ALSO