Man Linux: Main Page and Category List

NAME

       mcelog - Decode kernel machine check log on x86 machines

SYNOPSIS

       mcelog [options] [device]
       mcelog [options] --daemon
       mcelog [options] --client
       mcelog [options] --ascii
       mcelog --version

DESCRIPTION

       X86  CPUs  report  errors  detected  by the CPU as machine check events
       (MCEs).  These can be data corruption detected in the  CPU  caches,  in
       main memory by an integrated memory controller, data transfer errors on
       the front side bus  or  CPU  interconnect  or  other  internal  errors.
       Possible  causes  can  be  cosmic  radiation,  instable power supplies,
       cooling problems, broken hardware, or bad luck.

       Most errors can be corrected by the CPU by  internal  error  correction
       mechanisms. Uncorrected errors cause machine check exceptions which may
       panic the machine.

       When  a  corrected  error  happens  the  x86  kernel  writes  a  record
       describing  the  MCE  into a internal ring buffer available through the
       /dev/mcelog device mcelog retrieves errors  from  /dev/mcelog,  decodes
       them  into  a  human  readable  format  and prints them on the standard
       output or optionally into the system log.

       Optionally it can also take more options  like  keeping  statistics  or
       triggering shell scripts on specific events.

       The  normal operating modi for mcelog are running as a regular cron job
       (traditional way, deprecated), running as a trigger  directly  executed
       by the kernel, or running as a daemon with the --daemon option.

       When  an uncorrected machine check error happens that the kernel cannot
       recover from then it will usually panic the system.  In this case  when
       there  was  a  warm  reset  after  the  panic mcelog should pick up the
       machine check errors after reboot.  This is not possible after  a  cold
       reset.

       In addition mcelog can be used on the command line to decode the kernel
       output for a fatal machine check panic in text format using the --ascii
       option.  This is typically used to decode the panic console output of a
       fatal machine check, if the system was power cycled  or  mcelog  didn’t
       run immediately after reboot.

       When  the  panic  triggers  a kdump kexec crash kernel the crash kernel
       boot up script should log the machine checks to  disk,  otherwise  they
       might be lost.

       Note  that  after mcelog retrieves an error the kernel doesn’t store it
       anymore (different from dmesg(1)), so the output should be always saved
       somewhere and mcelog not run in uncontrolled ways.

OPTIONS

       When  the  --syslog  option is specified redirect output to system log.
       The --syslog-error option causes the normal machine checks to be logged
       as  LOG_ERR  (implies  --syslog  ).  Normally only fatal errors or high
       level remarks are  logged  with  error  level.   High  level  one  line
       summaries  of  specific errors are also logged to the syslog by default
       unless mcelog operates in --ascii mode.

       When the --logfile=file option is specified append log  output  to  the
       specified  file.  With  the  --no-syslog  option  mcelog will never log
       anything to the syslog.

       When the --cpu=cputype option is specified set the to be decoded CPU to
       cputype.   See  mcelog  --help  for  a  list  of valid CPUs.  Note that
       specifying an incorrect CPU can  lead  to  incorrect  decoding  output.
       Default  is  either  the  CPU  of the machine that reported the machine
       check (needs a newer kernel version) or the CPU of the  machine  mcelog
       is  running  on, so normally this option doesn’t have to be used. Older
       versions of mcelog had separate options for different CPU types.  These
       are still implemented, but deprecated and undocumented now.

       With  the  --dmi  option  mcelog will look up the addresses reported in
       machine checks  in  the  SMBIOS/DMI  tables  of  the  BIOS.   This  can
       sometimes  tell  you  which  DIMM  or memory controller has developed a
       problem. More often the information reported  by  the  BIOS  is  either
       subtly or obviously wrong or useless.  This option requires that mcelog
       has read access to /dev/mem (normally requires root) and  runs  on  the
       same  machine  in  the  same hardware configuration as when the machine
       check event happened.

       When --ignorenodev is specified then mcelog will exit silently when the
       device cannot be opened. This is useful in virtualized environment with
       limited devices.

       When --filter is specified mcelog will filter out known broken  machine
       check  events  (default  on).  When the --no-filter option is specified
       mcelog does not filter events.

       When --raw is specified mcelog will  not  decode,  but  just  dump  the
       mcelog  in  a  raw  hex  format.  This can be useful for automatic post
       processing.

       When a device is specified the machine check logs are read from  device
       instead of the default /dev/mcelog.

       With  the  --ascii  option  mcelog  decodes a fatal machine check panic
       generated by the kernel ("CPU n: Machine Check Exception ...") in ASCII
       from  standard  input  and  exits afterwards.  Note that when the panic
       comes from a different machine than where  mcelog  is  running  on  you
       might  need  to  specify the correct cputype on older kernels. On newer
       kernels which output the PROCESSOR field this is not needed anymore.

       When the --file filename option is specified mcelog --ascii  will  read
       the  ASCII  machine  check  record  from input file filename instead of
       standard input.

       With the --config-file file option mcelog reads  the  specified  config
       file.  Default is /etc/mcelog.conf See also CONFIG FILE below.

       With  the --daemon option mcelog will run in the background. This gives
       the fastest reaction time and is the recommended operating mode.   This
       option  implies  --syslog.  The option --foreground will prevent mcelog
       from giving up the terminal  in  daemon  mode.  This  is  intended  for
       debugging.

       With  the  --client  option  mcelog  will  query  a  running daemon for
       accumulated errors.

       With the --cpumhz=mhz option assume  the  CPU  has  mhz  frequency  for
       decoding  the  time of the event using the CPU time stamp counter. This
       also forces decoding. Note this can be  unreliable.   on  some  systems
       with  CPU  frequency scaling or deep C states, where the CPU time stamp
       counter does not increase linearly.  By default the  frequency  of  the
       current  CPU  is  used  when mcelog determines it is safe to use. Newer
       kernels report the time directly in  the  event  and  don’t  need  this
       anymore.

       The --pidfile file option writes the process id of the daemon into file
       file.  Only valid in daemon mode.

       --version displays the version of mcelog and exits.

CONFIG FILE

       mcelog supports a config file to set  defaults.  Command  line  options
       override  the  config  file.  By  default  the config file is read from
       /etc/mcelog.conf unless overridden with the --config-file option.

       The general format is optionname = value White space is not allowed  in
       value  currently,  except at the end where it is dropped Comments start
       with #.

       All command line options that are not commands can be specified in  the
       config  file.   For  example t to enable the --no-syslog option use no-
       syslog = yes (or no to disable).  When the option has  a  argument  use
       logfile = /tmp/logfile

NOTES

       The  kernel  prefers old messages over new. If the log buffer overflows
       only old ones will be kept.

       The exact output in the log file depends on the CPU, unless  the  --raw
       option is used.

       mcelog will report serious errors to the syslog during decoding.

FILES

       /dev/mcelog (char 10, minor 227)

       /etc/mcelog/mcelog.conf

       /sys/devices/system/machinecheck/machinecheck0/trigger

SEE ALSO

       AMD   x86-64   architecture   programmer’s  manual,  Volume  2,  System
       programming

       Intel 64 and IA32 Architectures Software Developer’s manual, Volume  3,
       System programming guide Parts 1 and 2. Machine checks are described in
       Chapter 14 in Part1 and in Appendix E in Part2.

       Datasheet of your CPU.

                                   May 2009