mcelog: Reduce default threshold for corrected error page offline

The default of 10/24h was reasonable for server quality
DDR3 DIMMs as of 2009/10. Newer systems can benefit from
more aggressive page offline when corrected errors are seen
See:
https://www.intel.com/content/dam/www/public/us/en/documents/intel-and-samsung-mrt-improving-memory-reliability-at-data-centers.pdf
for details.

Signed-off-by: Tony Luck <tony.luck@intel.com>
1 file changed
tree: 2ea9212dc95a40c1c3d5761bf9445247e641f7a8
  1. input/
  2. tests/
  3. triggers/
  4. .gitignore
  5. bitfield.c
  6. bitfield.h
  7. broadwell_de.c
  8. broadwell_de.h
  9. broadwell_epex.c
  10. broadwell_epex.h
  11. bus.c
  12. bus.h
  13. cache.c
  14. cache.h
  15. CHANGES
  16. client.c
  17. client.h
  18. config-intro.man
  19. config.c
  20. config.h
  21. core2.c
  22. core2.h
  23. cputype.table
  24. denverton.c
  25. denverton.h
  26. dmi.c
  27. dmi.h
  28. dunnington.c
  29. dunnington.h
  30. eventloop.c
  31. eventloop.h
  32. genconfig.py
  33. granite.c
  34. granite.h
  35. haswell.c
  36. haswell.h
  37. i10nm.c
  38. i10nm.h
  39. intel.c
  40. intel.h
  41. ivy-bridge.c
  42. ivy-bridge.h
  43. k8.c
  44. k8.h
  45. leaky-bucket.c
  46. leaky-bucket.h
  47. LICENSE
  48. list.h
  49. lk10-mcelog.pdf
  50. Makefile
  51. mce.pdf
  52. mcelog.8
  53. mcelog.c
  54. mcelog.conf
  55. mcelog.cron
  56. mcelog.h
  57. mcelog.init
  58. mcelog.logrotate
  59. mcelog.service
  60. mcelog.triggers.5
  61. memdb.c
  62. memdb.h
  63. memutil.c
  64. memutil.h
  65. mkcputype
  66. msg.c
  67. msg.h
  68. msr.c
  69. nehalem.c
  70. nehalem.h
  71. p4.c
  72. p4.h
  73. page.c
  74. page.h
  75. paths.h
  76. rbtree.c
  77. rbtree.h
  78. README.md
  79. README.releases
  80. sandy-bridge.c
  81. sandy-bridge.h
  82. sapphire.c
  83. sapphire.h
  84. server.c
  85. server.h
  86. skylake_xeon.c
  87. skylake_xeon.h
  88. sysfs.c
  89. sysfs.h
  90. THIRD-PARTY
  91. trigger.c
  92. trigger.h
  93. tsc.c
  94. tsc.h
  95. tulsa.c
  96. tulsa.h
  97. unknown.c
  98. unknown.h
  99. version.h
  100. yellow.c
  101. yellow.h
README.md

mcelog

mcelog is the user space backend for logging machine check errors reported by the hardware to the kernel. The kernel does the immediate actions (like killing processes etc.) and mcelog decodes the errors and manages various other advanced error responses like offlining memory, CPUs or triggering events. In addition mcelog also handles corrected errors, by logging and accounting them. It primarily handles machine checks and thermal events, which are reported for errors detected by the CPU.

For more details on what mcelog can do and the underlying theory see mcelog.org.

It is recommended that mcelog runs on all x86 machines, both 64bit (since early 2.6) and 32bit (since 2.6.32).

mcelog can run in several modes:

  • cronjob
  • trigger
  • daemon

cronjob is the old method. mcelog runs every 5 minutes from cron and checks for errors. Disadvantage of this is that it can delay error reporting significantly (upto 10 minutes) and does not allow mcelog to keep extended state.

trigger is a newer method where the kernel runs mcelog on a error.

This is configured with:

echo /usr/sbin/mcelog > /sys/devices/system/machinecheck/machinecheck0/trigger

This is faster, but still doesn't allow mcelog to keep state, and has relatively high overhead for each error because a program has to be initialized from scratch.

In daemon mode mcelog runs continuously as a daemon in the background and wait for errors. It is enabled by running mcelog --daemon & from a init script. This is the fastest and most feature-ful.

The recommended mode is daemon, because several new functions (like page error predictive failure analysis) require a continuously running daemon.

Documentation

  • The primary reference documentation are the man pages.
  • lk10-mcelog.pdf has a overview over the errors mcelog handles (originally from Linux Kongress 2010).
  • mce.pdf is a very old paper describing the first releases of mcelog (some parts are obsolete).

For distributors

You can run mcelog from systemd or similar daemons. An example systemd unit file is in mcelog.service.

By default mcelog reports its version as the git tag. This can be overridden by setting up a .os_version file in the source directory. A build system could write the OS version to this file to mark the binary.

For older distributions using init scripts

Please install an init script by default that runs mcelog in daemon mode. The mcelog.init script is a good starting point. Also install a logrotated file (mcelog.logrotate) or equivalent when mcelog is running in daemon mode. These two are not in make install.

The installation also requires a config file /etc/mcelog.conf and the default triggers. These are all installed by make install

/dev/mcelog is needed for mcelog operation. If it's not there it can be created with:

mknod /dev/mcelog c 10 227

Normally it should be created automatically in udev.

Security

mcelog needs to run as root because it might trigger actions like page-offlining, which require CAP_SYS_ADMIN. Also it opens /dev/mcelog and an UNIX socket for client support.

It also opens /dev/mem to parse the BIOS DMI tables. It is careful to close the file descriptor and unmap any mappings after using them.

There is support for changing the user in daemon mode after opening the device and the sockets, but that would stop triggers from doing corrective action that require root.

In principle it would be possible to only keep CAP_SYS_ADMIN for page-offling, but that would prevent triggers from doing root-only actions not covered by it (and CAP_SYS_ADMIN is not that different from full root)

In daemon mode mcelog listens to a UNIX socket and processes requests from sh mcelog --client. This can be disabled in the configuration file. The uid/gid of the requestor is checked on access and is configurable (default 0/0 only). The command parsing code is very straight forward (server.c). The client parsing/reply is currently done with full privileges of the daemon.

Testing

There is a simple test suite in sh tests/. The test suite requires root to run and access to mce-inject and a kernel with MCE injection support CONFIG_X86_MCE_INJECT. It will kill any running mcelog daemon.

Run it with sh make test.

The test suite requires the mce-inject tool. The mce-inject executable must be either in $PATH or in the ../mce-inject directory.

You can also test under valgrind with sh make valgrind-test. For this valgrind needs to be installed of course. Advanced valgrind options can be specified with:

make VALGRIND="valgrind --option" valgrind-test

Other checks

make iccverify and make clangverify run the static verifiers in clang and icc respectively.

License

This program is licensed under the subject of the GNU Public General License, v.2