blob: 418e728a60fe260fb9c357d99397c96e95780eaf [file] [log] [blame]
MCE test suite HOWTO
====================
11 November 2008
Huang Ying
Section 4.2 (Test with kdump test driver) is based on the README of
LTP kdump test case.
Abstract
--------
This document explains the structure and design of MCE test suite, the
kernel patch and user space tools needed for automatic tests, usage
guide and how to add new test cases into test suite.
0. Quick shortcut
------------------
- Install the Linux kernel with full MCE injection support, including
latest Linux kernel (2.6.31) and MCE injection enhancement patchset
in: http://ftp.kernel.org/pub/linux/kernel/people/yhuang/mce/. Make
sure following configuration options are enabled:
CONFIG_X86_MCE=y
CONFIG_X86_MCE_INTEL=y
CONFIG_X86_MCE_INJECT=y or CONFIG_X86_MCE_INJECT=m
- Get mcelog git version from
git://git.kernel.org/pub/scm/utils/cpu/mce/mcelog.git.
and install in /usr/sbin (or rather first in your $PATH)
git clone git://git.kernel.org/pub/scm/utils/cpu/mce/mcelog.git
cd mcelog
make
sudo make install
- Get mce-inject git version from
git://git.kernel.org/pub/scm/utils/cpu/mce/mce-inject.git.
git clone git://git.kernel.org/pub/scm/utils/cpu/mce/mce-inject.git
cd mce-inject
make
sudo make install
- Install page-types tool (sec 3.4), which is accompanied with Linux kernel
source (2.6.32 or newer).
cd $KERNEL_SRC/Documentation/vm/
gcc -o page-types page-types.c
cp page-types /usr/bin/
- Run make test
This will do the basic tests, but not the more complicated kdump ones.
For more information on those read below.
1. Introduction
---------------
The MCE test suite is a collection of tools and test scripts for
testing the Linux kernel MCE processing features. The goal is to cover
most Linux kernel MCE processing code paths and features with
automation tests.
If you just want to start testing as quickly as possible, you can skip
section 2 and section 3, and go section 4.1 directly.
2. Structure
------------
The main intention behind the design is to re-use test cases amongst
various test methods (represented as test drivers), such as kdump
based, kernel MCE panic log (tolerant=3) based, etc.
2.1 Test cases
Test cases are grouped into test case classes. The test cases in one
class share the similar triggering, result collecting and result
verifying methods. They can be used in same set of test drivers. The
interface of a test case class is a shell script, usually named as
cases.sh under a sub-directory of cases/. The following command line
option should be supported by the test case class shell script:
cases.sh enumerate enumerate test cases in class, print test
case names to stdout
cases.sh trigger trigger the test case
cases.sh get_result get the result of test case
cases.sh verify verify the result of test case, and print
the verify result to stdout
When execute cases.sh [trigger|get_result|verify], the test case is
specified via environment variable this_case, which must be one of the
test case names returned by "cases.sh enumerate".
Other environment variables are also used to pass some information
from driver to test cases, such as:
this_case name current test case
driver name of test driver
klog file name which holds kernel log during test
KSRC_DIR (for gcov) kernel source code directory
GCOV (for gcov) gcov collection method
vmcore (for kdump) vmcore file name
reboot (for kdump) indicate there is a reboot between test
case trigger and test case verify, some
context has been gone.
Several test case classes are provided with the test suite.
cases/soft-inj/* is based on mce-inject MCE software injection tool.
cases/apei-inj/* is based on apei-inj APEI haredware injection tool.
cases/<injection tool>/<class name>/cases.sh Interface of the test case class
cases/<injection tool>/<class name>/data/ Directory contains data file
cases/<injection tool>/<class name>/refer/ Directory contains data file for
reference MCE records if necessary.
For document of various test cases, please refer to doc/cases/*.
2.2 Test drivers
Test drivers drive the test procedure, its main structure is a loop
over test case classes specified in configuration file. For each test
case class, test driver loops over test cases returned by "cases.sh
enumerate". And, for each test case, it calls "cases.sh" to trigger,
get_result and verify the test case. Test driver also do some common
work for test cases, such as kdump driver collects vmcore file, and
invoking gcovdump command to get gcov data file.
The interface of test driver is driver.sh, which is usually put in
drivers/<driver_name>/ directory. The test configuration file should
be used as the only command line parameter for driver.sh. Test case
classes should be specified in test configuration file as CASES
variable, details below.
2.3 Test configuration file
Test configuration file is a shell script to specify parameters for
test drivers and test cases. It must be put in config/ directory. The
parameters are represented as shell variables as follow:
CASES Name of test case classes, separate by
white space.
START_BACKGROUND Shell command to start a background process
during testing, used for random testing.*
STOP_BACKGROUND Shell command to stop the background process
during testing.
COREDIR (for kdump) directory contains Linux kernel crash core
dump after kdump.
VMLINUX (for kdump, gcov) vmlinux of Linux kernel
GCOV (for gcov) Enable GCOV if set none zero.
KSRC_DIR (for gcov) Kernel source code directory
* To test MCE processing under random environment, a background
process can be automatically run simultaneously during MCE
testing. The start/stop command is specified via START_BACKGROUND
and STOP_BACKGROUND.
2.4 Test result
After test, the general test result will go results/<driver_name>/result.
The format of general test result is as follow:
<test case name>:
Passed: item 1 description
Failed: item 2 description
...
Passed: item n description
One blank line is used to separate test cases.
Additional test result for various test cases will go
"results/<driver_name>/<case_name>/<xxx>. For in-package test case
class, additional test results include:
results/<driver_name>/<injection_tool>/<case_name>/klog
Kernel log during testing
results/<driver_name>/<injection_tool>/<case_name>/mcelog
mcelog output during testing
results/<driver_name>/<injection_tool>/<case_name>/mcelog_refer
mcelog output reference
results/<driver_name>/<injection_tool>/<case_name>/mce_64.c.gcov (for gcov)
gcov output file
3. Tools
--------
3.1 mce-inject
mce-inject is a software MCE injection tool, which is based on Linux
kernel software MCE injection mechanism. To inject a MCE into Linux
kernel via mce-inject, a data file should be provided. The syntax is
similar to the logging output by mcelog with some extensions.
Please refer to the documentation of mce-inject for more information.
The mce-inject program must be executable in $PATH.
3.2 mcelog
mcelog read /dev/mcelog and prints the stored machine check records to
stdout. It is used by MCE test suite to verify MCE records generated
by kernel is same as reference records, at most time, same as input
records. The current git mcelog version is needed for MCE test suite to
work properly. Please refer to document of mcelog for more
information. The latest mcelog can be gotten via git snapshot from
git://git.kernel.org/pub/scm/utils/cpu/mce/mcelog.git.
Note you need the git version of mcelog available in $PATH.
3.3 gcovdump
gcov is a test coverage tool, the original implementation is used for
user space program only. LTP (Linux Test Project) provides the kernel
gcov support. But MCE test involves panic or kdump, so gcovdump is
developed to dump gcov data from kdump crash dump core. gcovdump has
been merged by LTP cvs. For more information please refer to gcovdump
document. The latest gcovdump can be gotten from cvs:
http://ltp.cvs.sourceforge.net/viewvc/ltp/utils/analysis/gcov-kdump/.
3.4 page-types
A tool to query page types, which is accompanied with Linux kernel
source (2.6.32 or newer, $KERNEL_SRC/Documentation/vm/page-types.c).
It is required for MCE apei-inj testing.
4. Usage Guide
--------------
4.1 Test with simple test driver
4.1.1 Simple test driver
The simple test driver just call cases.sh of test cases one by one in
a loop. So it is not permitted for test cases to trigger real panic or
reboot during test. For MCE testing, a special processing mode to just
log everything in case of MCE is used for the simple test driver, it
is enabled via set MCE parameter "tolerant=3" during
testing. "tolerant" can be set via writing:
/sys/devices/system/machinecheck/machinecheck0/tolerant
4.1.2 test instruction
The following is the basic test instruction, for some additional
features such as gcov support, please refer to corresponding
instructions.
a. Linux kernel and user space tools as follow should be installed
- A Linux kernel with full MCE injection support (see 0)
- mce-inject tool (see 3.1)
- mcelog with proper version (see 3.2)
- page-types (see 3.4)
b. Modify config/simple.conf or create a new test configuration
file. Refer to section 2.3 for more instruction about test
configuration file.
c. Run "make". Carefully check for any errors.
d. It is recommended to stop cron before testing. Because there
might be another mcelog reading events running on background
by cron, which will upset the test.
/etc/init.d/crond stop
e. To be root and invoke simple test driver on test configuration file
as follow
Run "make test" to do all the standard tests that do not require
special set up.
f. General test result will go results/simple/result. Test log will go
work/simple/log. Additional test results for various test cases
will go results/simple/<test case>/<xxx>. For more details about
in-package test case class, please refer to section 2.1.
4.2 Test with kdump test driver
4.2.1 kdump test driver
The kdump test driver is based on the kdump test case in Linux Test
Project, thank LTP for their excellent work!
The kdump driver helps run tests which trigger crash/panic and
generate result and report via kdump. The test scripts cycle through a
series of crash/panic scenarios. Each test cycle does the following:
a. Triggers a test case which triggers crash/panic (MCE with tolerant=1).
b. Kdump kernel boots and saves a vmcore.
c. System reboots to 1st kernel.
d. Verifies test case, generate result and report.
e. After a 1 to 2 minute delay, the next test case is run.
4.2.2 test instruction
Follow the steps to setup kdump test driver.
The test driver is written for SuSE Linux Enterprise Server 10 (and
onward releases), OpenSUSE, Fedora, Debian, as well as RedHat
Enterprise Linux 5. Since KDUMP is supported by the above mentioned
distro's the test driver was written and tested on them. Contribution
towards supporting more distributions are welcome.
a. Install Linux kernel with full MCE injection and KDUMP support. In
addition to MCE injection support in section 0, the following
configuration options should be enabled too:
CONFIG_KEXEC=y
CONFIG_CRASH_DUMP=y
b. Install these additional packages:
For SLES10 or OpenSUSE Distro:
* kernel-kdump
* kernel-source
* kexec-tools
For RHEL5 or Fedora distro:
* kexec-tools
* kernel-devel
c. Configure where to put the kdump /proc/vmcore files. The path should be
specified via COREDIR in test configuration file.
By default, the kdump /proc/vmcore files will be put into /var/crash.
For SLES10 or OpenSUSE Distro:
* edit KDUMP_SAVEDIR in /etc/sysconfig/kdump
For RHEL5 or Fedora distro:
* edit path in /etc/kdump.conf
d. In addition to bzImage and modules of Linux kernel should be
installed on test machine, the vmlinux of Linux kernel should be
put on test machine and specified via VMLINUX in test configuration
file.
e. Make sure the partition where the test driver is running has space
for the tests results and one vmcore file (size of physical
memory).
f. Now, reboot system. Test if kdump works by starting kdump and triggering
kernel panic.
For SLES10 or OpenSUSE Distro:
service boot.kdump restart
chkconfig boot.kdump on
echo "c" > /proc/sysrq-trigger
For RHEL5 or Fedora distro:
service kdump restart
/sbin/chkconfig kdump on
echo "c" > /proc/sysrq-trigger
After system reboot, check if there are vmcore files. By default, they are in /var/crash/*/. If yes, "kdump" works in the system.
g. Create a new test configuration file or use a existing one in
config/, such as kdump.conf. Note: not all test case classes can be
used with kdump test driver, see "important points" below.
h. Run "make". Carefully check for any errors.
i. To be root and run "drivers/kdump/driver.sh <conf>" or "make test-kdump" (for a full test)
j. After test is done, the test log of the last run of kdump driver will
be displayed on main console.
Few Important points to remember:
- kdump test driver request that a real panic should be triggered when
test case is triggered. So not all test case classes can be used
with kdump test driver, for example, all test case classes for
corrected MCE can not be used with kdump test driver.
- If you need to stop the tests before all test cases have run, run
"crontab -r" and "killall driver.sh" within 1 minute after the 1st
kernel reboots. Then, if you'd like to carry on tests from that point
on, run:
rm work/kdump/stamps/setupped
drivers/kdump/driver.sh <conf>
If you'd like to start tests from the beginning, run:
make reset
drivers/kdump/driver.sh <conf>
- If a failure occurs when booting the kdump kernel, you'll need to
manually reset the system so it reboots back to the 1st kernel and
continues on to the next test. For this reason, it's best to monitor
the tests from a console. If possible, setup a serial console (not a
must, any type of console setup will do). If using minicom, enable
saving of kernel messages displayed on minicom into a file, by
pressing ctrl+a+l on the console. Else, when it is observed that the
kdump kernel has failed to boot, manually copy the boot message into
a file to enable the debugging the cause of the hang.
- The results are saved in results/kdump/result, which also shows
where you are in the test run. When the "Test run complete" entry
appears in that file, you're done. Verbose log can be found at
work/log.
- The test machine would be unavailable for any other work during the
period of the test run.
4.3 Gcov support
Gcov is a test coverage tool. It can be used to discover untested
parts of program, collect branch taken statistics to optimize program,
etc. In MCE test suite, it is used to get test coverage, that is,
which C statements are covered by each test case.
Gcov support is optional, if you don't care about test coverage
information, just skip this section.
a. Make sure your kernel has gcov support. You can find lasted kernel
gcov patches from:
http://ltp.sourceforge.net/coverage/gcov.php
A README for kernel gcov can be found from:
http://ltp.sourceforge.net/coverage/gcov/readme.php
Notes: CONFIG_GCOV_ALL does not work for me. Add the line
EXTRA_CFLAGS += $(KBUILD_GCOV_FLAGS)
to the respective Makefiles are more stable. For example, this line
can be added into "linux/arch/x86/kernel/cpu/mcheck/Makefile"
b. If you want to use gcov with kdump test driver, please install
gcovdump tool(see section 3.4). The latest gcovdump can be gotten
from cvs:
http://ltp.cvs.sourceforge.net/viewvc/ltp/utils/analysis/gcov-kdump/.
c. Linux kernel source source code should be put on the test
machine. Its root directory should be specified in test
configuration file via KSRC_DIR.
d. In addition to bzImage and modules of Linux kernel should be
installed on test machine, the vmlinux of Linux kernel should be
put on test machine and specified via VMLINUX of test configuration
file.
e. Make sure gcov is available in your test system. It comes with gcc
package normally. If kdump test driver is used, a tool named
gcovdump is also needed to dump *.gcda from crash dump image.
f. In test configuration file, make sure the following setting is
available:
# enable GCOV support
GCOV=1
# kernel source is needed to get gcov graph
KSRC_DIR=<kernel source directory>
VMLINUX=<vmlinux>
g. After testing, *.c.gcov will be generated in test case result
directory, such as
results/kdump/soft-inj/non-panic/corrected/mce_64.c.gcov.
h. To merge gcov graph data from several test cases, a tool named
gcov_merge.py in tools sub-directory can be used. For example,
tools/gcov_merge results/kdump/soft-inj/*/*/mce_64.c.gcov
Will output merged gcov graph from all test cases under
soft-inj. This can be used to check coverage of several test cases.
4.4 tool
Some tools are provided to help analyze test result.
- tools/grep_result.sh
Grep from general test result (results/<driver_name>/result) in
terms of test case instead of line, because the result of one test
case may span several line.
Usage:
cat results/<driver_name>/result | tools/grep_result.sh <grep options>
Where <grep options> are same as options available to /bin/grep.
- tools/loop-mce-test
Run mce test cases in a loop. It exits on failure of any one of the test
cases. This script is using simple test driver.
Usage:
./loop-mce-test <config_file>
Note that only simple test configure file can be used here.
5. Add test cases
-----------------
5.1 Add test case to in-package test class
All in-package test classes use mce-inject software injection tool and
follows same structure. The steps to add a new test case is as follow:
a. Find an appropriate test case class to add your test case.
b. Add a new mce-inject data file into to cases/soft-inj/<class name>/data/.
c. If the reference mcelog is different from mce-inject input data
file, put that reference file into cases/soft-inj/<class_name>/refer/.
d. In cases/soft-inj/<class name>/cases.sh, there are shell commands
"case" in shell functions get_result() and verify(). Add a branch
in each shell command "case" for your test case.
5.2 Add test class
To add a new test class, add a cases.sh under a sub-directory of
cases/, and follow the test case class interface definition in section
2.1. The general result output format should follow that in section
2.4.