| |
| MCE test suite HOWTO |
| ==================== |
| |
| 11 November 2008 |
| |
| Huang Ying |
| |
| Section 4.2 (Test with kdump test driver) is based on the README of |
| LTP kdump test case. |
| |
| Abstract |
| -------- |
| |
| This document explains the structure and design of MCE test suite, the |
| kernel patch and user space tools needed for automatic tests, usage |
| guide and how to add new test cases into test suite. |
| |
| 0. Quick shortcut |
| ------------------ |
| |
| - Install the Linux kernel with full MCE injection support, including |
| latest Linux kernel (2.6.31) and MCE injection enhancement patchset |
| in: http://ftp.kernel.org/pub/linux/kernel/people/yhuang/mce/. Make |
| sure following configuration options are enabled: |
| |
| CONFIG_X86_MCE=y |
| CONFIG_X86_MCE_INTEL=y |
| CONFIG_X86_MCE_INJECT=y or CONFIG_X86_MCE_INJECT=m |
| |
| - Get mcelog git version from |
| git://git.kernel.org/pub/scm/utils/cpu/mce/mcelog.git. |
| and install in /usr/sbin (or rather first in your $PATH) |
| |
| git clone git://git.kernel.org/pub/scm/utils/cpu/mce/mcelog.git |
| cd mcelog |
| make |
| sudo make install |
| |
| - Get mce-inject git version from |
| git://git.kernel.org/pub/scm/utils/cpu/mce/mce-inject.git. |
| |
| git clone git://git.kernel.org/pub/scm/utils/cpu/mce/mce-inject.git |
| cd mce-inject |
| make |
| sudo make install |
| |
| - Install page-types tool (sec 3.4), which is accompanied with Linux kernel |
| source (2.6.32 or newer). |
| |
| cd $KERNEL_SRC/Documentation/vm/ |
| gcc -o page-types page-types.c |
| cp page-types /usr/bin/ |
| |
| - Run make test |
| This will do the basic tests, but not the more complicated kdump ones. |
| For more information on those read below. |
| |
| 1. Introduction |
| --------------- |
| |
| The MCE test suite is a collection of tools and test scripts for |
| testing the Linux kernel MCE processing features. The goal is to cover |
| most Linux kernel MCE processing code paths and features with |
| automation tests. |
| |
| If you just want to start testing as quickly as possible, you can skip |
| section 2 and section 3, and go section 4.1 directly. |
| |
| |
| 2. Structure |
| ------------ |
| |
| The main intention behind the design is to re-use test cases amongst |
| various test methods (represented as test drivers), such as kdump |
| based, kernel MCE panic log (tolerant=3) based, etc. |
| |
| 2.1 Test cases |
| |
| Test cases are grouped into test case classes. The test cases in one |
| class share the similar triggering, result collecting and result |
| verifying methods. They can be used in same set of test drivers. The |
| interface of a test case class is a shell script, usually named as |
| cases.sh under a sub-directory of cases/. The following command line |
| option should be supported by the test case class shell script: |
| |
| cases.sh enumerate enumerate test cases in class, print test |
| case names to stdout |
| cases.sh trigger trigger the test case |
| cases.sh get_result get the result of test case |
| cases.sh verify verify the result of test case, and print |
| the verify result to stdout |
| |
| When execute cases.sh [trigger|get_result|verify], the test case is |
| specified via environment variable this_case, which must be one of the |
| test case names returned by "cases.sh enumerate". |
| |
| Other environment variables are also used to pass some information |
| from driver to test cases, such as: |
| |
| this_case name current test case |
| driver name of test driver |
| klog file name which holds kernel log during test |
| KSRC_DIR (for gcov) kernel source code directory |
| GCOV (for gcov) gcov collection method |
| vmcore (for kdump) vmcore file name |
| reboot (for kdump) indicate there is a reboot between test |
| case trigger and test case verify, some |
| context has been gone. |
| |
| Several test case classes are provided with the test suite. |
| cases/soft-inj/* is based on mce-inject MCE software injection tool. |
| cases/apei-inj/* is based on apei-inj APEI haredware injection tool. |
| |
| cases/<injection tool>/<class name>/cases.sh Interface of the test case class |
| cases/<injection tool>/<class name>/data/ Directory contains data file |
| cases/<injection tool>/<class name>/refer/ Directory contains data file for |
| reference MCE records if necessary. |
| |
| For document of various test cases, please refer to doc/cases/*. |
| |
| 2.2 Test drivers |
| |
| Test drivers drive the test procedure, its main structure is a loop |
| over test case classes specified in configuration file. For each test |
| case class, test driver loops over test cases returned by "cases.sh |
| enumerate". And, for each test case, it calls "cases.sh" to trigger, |
| get_result and verify the test case. Test driver also do some common |
| work for test cases, such as kdump driver collects vmcore file, and |
| invoking gcovdump command to get gcov data file. |
| |
| The interface of test driver is driver.sh, which is usually put in |
| drivers/<driver_name>/ directory. The test configuration file should |
| be used as the only command line parameter for driver.sh. Test case |
| classes should be specified in test configuration file as CASES |
| variable, details below. |
| |
| 2.3 Test configuration file |
| |
| Test configuration file is a shell script to specify parameters for |
| test drivers and test cases. It must be put in config/ directory. The |
| parameters are represented as shell variables as follow: |
| |
| CASES Name of test case classes, separate by |
| white space. |
| START_BACKGROUND Shell command to start a background process |
| during testing, used for random testing.* |
| STOP_BACKGROUND Shell command to stop the background process |
| during testing. |
| COREDIR (for kdump) directory contains Linux kernel crash core |
| dump after kdump. |
| VMLINUX (for kdump, gcov) vmlinux of Linux kernel |
| GCOV (for gcov) Enable GCOV if set none zero. |
| KSRC_DIR (for gcov) Kernel source code directory |
| |
| * To test MCE processing under random environment, a background |
| process can be automatically run simultaneously during MCE |
| testing. The start/stop command is specified via START_BACKGROUND |
| and STOP_BACKGROUND. |
| |
| 2.4 Test result |
| |
| After test, the general test result will go results/<driver_name>/result. |
| The format of general test result is as follow: |
| |
| <test case name>: |
| Passed: item 1 description |
| Failed: item 2 description |
| ... |
| Passed: item n description |
| |
| One blank line is used to separate test cases. |
| |
| Additional test result for various test cases will go |
| "results/<driver_name>/<case_name>/<xxx>. For in-package test case |
| class, additional test results include: |
| |
| results/<driver_name>/<injection_tool>/<case_name>/klog |
| Kernel log during testing |
| results/<driver_name>/<injection_tool>/<case_name>/mcelog |
| mcelog output during testing |
| results/<driver_name>/<injection_tool>/<case_name>/mcelog_refer |
| mcelog output reference |
| results/<driver_name>/<injection_tool>/<case_name>/mce_64.c.gcov (for gcov) |
| gcov output file |
| |
| |
| 3. Tools |
| -------- |
| |
| 3.1 mce-inject |
| |
| mce-inject is a software MCE injection tool, which is based on Linux |
| kernel software MCE injection mechanism. To inject a MCE into Linux |
| kernel via mce-inject, a data file should be provided. The syntax is |
| similar to the logging output by mcelog with some extensions. |
| Please refer to the documentation of mce-inject for more information. |
| |
| The mce-inject program must be executable in $PATH. |
| |
| 3.2 mcelog |
| |
| mcelog read /dev/mcelog and prints the stored machine check records to |
| stdout. It is used by MCE test suite to verify MCE records generated |
| by kernel is same as reference records, at most time, same as input |
| records. The current git mcelog version is needed for MCE test suite to |
| work properly. Please refer to document of mcelog for more |
| information. The latest mcelog can be gotten via git snapshot from |
| git://git.kernel.org/pub/scm/utils/cpu/mce/mcelog.git. |
| |
| Note you need the git version of mcelog available in $PATH. |
| |
| 3.3 gcovdump |
| |
| gcov is a test coverage tool, the original implementation is used for |
| user space program only. LTP (Linux Test Project) provides the kernel |
| gcov support. But MCE test involves panic or kdump, so gcovdump is |
| developed to dump gcov data from kdump crash dump core. gcovdump has |
| been merged by LTP cvs. For more information please refer to gcovdump |
| document. The latest gcovdump can be gotten from cvs: |
| http://ltp.cvs.sourceforge.net/viewvc/ltp/utils/analysis/gcov-kdump/. |
| |
| 3.4 page-types |
| A tool to query page types, which is accompanied with Linux kernel |
| source (2.6.32 or newer, $KERNEL_SRC/Documentation/vm/page-types.c). |
| It is required for MCE apei-inj testing. |
| |
| 4. Usage Guide |
| -------------- |
| |
| 4.1 Test with simple test driver |
| |
| 4.1.1 Simple test driver |
| |
| The simple test driver just call cases.sh of test cases one by one in |
| a loop. So it is not permitted for test cases to trigger real panic or |
| reboot during test. For MCE testing, a special processing mode to just |
| log everything in case of MCE is used for the simple test driver, it |
| is enabled via set MCE parameter "tolerant=3" during |
| testing. "tolerant" can be set via writing: |
| /sys/devices/system/machinecheck/machinecheck0/tolerant |
| |
| 4.1.2 test instruction |
| |
| The following is the basic test instruction, for some additional |
| features such as gcov support, please refer to corresponding |
| instructions. |
| |
| a. Linux kernel and user space tools as follow should be installed |
| |
| - A Linux kernel with full MCE injection support (see 0) |
| - mce-inject tool (see 3.1) |
| - mcelog with proper version (see 3.2) |
| - page-types (see 3.4) |
| |
| b. Modify config/simple.conf or create a new test configuration |
| file. Refer to section 2.3 for more instruction about test |
| configuration file. |
| |
| c. Run "make". Carefully check for any errors. |
| |
| d. It is recommended to stop cron before testing. Because there |
| might be another mcelog reading events running on background |
| by cron, which will upset the test. |
| |
| /etc/init.d/crond stop |
| |
| e. To be root and invoke simple test driver on test configuration file |
| as follow |
| |
| Run "make test" to do all the standard tests that do not require |
| special set up. |
| |
| f. General test result will go results/simple/result. Test log will go |
| work/simple/log. Additional test results for various test cases |
| will go results/simple/<test case>/<xxx>. For more details about |
| in-package test case class, please refer to section 2.1. |
| |
| |
| 4.2 Test with kdump test driver |
| |
| 4.2.1 kdump test driver |
| |
| The kdump test driver is based on the kdump test case in Linux Test |
| Project, thank LTP for their excellent work! |
| |
| The kdump driver helps run tests which trigger crash/panic and |
| generate result and report via kdump. The test scripts cycle through a |
| series of crash/panic scenarios. Each test cycle does the following: |
| |
| a. Triggers a test case which triggers crash/panic (MCE with tolerant=1). |
| b. Kdump kernel boots and saves a vmcore. |
| c. System reboots to 1st kernel. |
| d. Verifies test case, generate result and report. |
| e. After a 1 to 2 minute delay, the next test case is run. |
| |
| 4.2.2 test instruction |
| |
| Follow the steps to setup kdump test driver. |
| |
| The test driver is written for SuSE Linux Enterprise Server 10 (and |
| onward releases), OpenSUSE, Fedora, Debian, as well as RedHat |
| Enterprise Linux 5. Since KDUMP is supported by the above mentioned |
| distro's the test driver was written and tested on them. Contribution |
| towards supporting more distributions are welcome. |
| |
| a. Install Linux kernel with full MCE injection and KDUMP support. In |
| addition to MCE injection support in section 0, the following |
| configuration options should be enabled too: |
| |
| CONFIG_KEXEC=y |
| CONFIG_CRASH_DUMP=y |
| |
| b. Install these additional packages: |
| |
| For SLES10 or OpenSUSE Distro: |
| |
| * kernel-kdump |
| * kernel-source |
| * kexec-tools |
| |
| For RHEL5 or Fedora distro: |
| |
| * kexec-tools |
| * kernel-devel |
| |
| c. Configure where to put the kdump /proc/vmcore files. The path should be |
| specified via COREDIR in test configuration file. |
| By default, the kdump /proc/vmcore files will be put into /var/crash. |
| |
| For SLES10 or OpenSUSE Distro: |
| * edit KDUMP_SAVEDIR in /etc/sysconfig/kdump |
| For RHEL5 or Fedora distro: |
| * edit path in /etc/kdump.conf |
| |
| d. In addition to bzImage and modules of Linux kernel should be |
| installed on test machine, the vmlinux of Linux kernel should be |
| put on test machine and specified via VMLINUX in test configuration |
| file. |
| |
| e. Make sure the partition where the test driver is running has space |
| for the tests results and one vmcore file (size of physical |
| memory). |
| |
| f. Now, reboot system. Test if kdump works by starting kdump and triggering |
| kernel panic. |
| |
| For SLES10 or OpenSUSE Distro: |
| service boot.kdump restart |
| chkconfig boot.kdump on |
| echo "c" > /proc/sysrq-trigger |
| |
| For RHEL5 or Fedora distro: |
| service kdump restart |
| /sbin/chkconfig kdump on |
| echo "c" > /proc/sysrq-trigger |
| |
| After system reboot, check if there are vmcore files. By default, they are in /var/crash/*/. If yes, "kdump" works in the system. |
| |
| g. Create a new test configuration file or use a existing one in |
| config/, such as kdump.conf. Note: not all test case classes can be |
| used with kdump test driver, see "important points" below. |
| |
| h. Run "make". Carefully check for any errors. |
| |
| i. To be root and run "drivers/kdump/driver.sh <conf>" or "make test-kdump" (for a full test) |
| |
| j. After test is done, the test log of the last run of kdump driver will |
| be displayed on main console. |
| |
| Few Important points to remember: |
| |
| - kdump test driver request that a real panic should be triggered when |
| test case is triggered. So not all test case classes can be used |
| with kdump test driver, for example, all test case classes for |
| corrected MCE can not be used with kdump test driver. |
| |
| - If you need to stop the tests before all test cases have run, run |
| "crontab -r" and "killall driver.sh" within 1 minute after the 1st |
| kernel reboots. Then, if you'd like to carry on tests from that point |
| on, run: |
| rm work/kdump/stamps/setupped |
| drivers/kdump/driver.sh <conf> |
| If you'd like to start tests from the beginning, run: |
| make reset |
| drivers/kdump/driver.sh <conf> |
| |
| - If a failure occurs when booting the kdump kernel, you'll need to |
| manually reset the system so it reboots back to the 1st kernel and |
| continues on to the next test. For this reason, it's best to monitor |
| the tests from a console. If possible, setup a serial console (not a |
| must, any type of console setup will do). If using minicom, enable |
| saving of kernel messages displayed on minicom into a file, by |
| pressing ctrl+a+l on the console. Else, when it is observed that the |
| kdump kernel has failed to boot, manually copy the boot message into |
| a file to enable the debugging the cause of the hang. |
| |
| - The results are saved in results/kdump/result, which also shows |
| where you are in the test run. When the "Test run complete" entry |
| appears in that file, you're done. Verbose log can be found at |
| work/log. |
| |
| - The test machine would be unavailable for any other work during the |
| period of the test run. |
| |
| 4.3 Gcov support |
| |
| Gcov is a test coverage tool. It can be used to discover untested |
| parts of program, collect branch taken statistics to optimize program, |
| etc. In MCE test suite, it is used to get test coverage, that is, |
| which C statements are covered by each test case. |
| |
| Gcov support is optional, if you don't care about test coverage |
| information, just skip this section. |
| |
| a. Make sure your kernel has gcov support. You can find lasted kernel |
| gcov patches from: |
| http://ltp.sourceforge.net/coverage/gcov.php |
| |
| A README for kernel gcov can be found from: |
| http://ltp.sourceforge.net/coverage/gcov/readme.php |
| |
| Notes: CONFIG_GCOV_ALL does not work for me. Add the line |
| EXTRA_CFLAGS += $(KBUILD_GCOV_FLAGS) |
| to the respective Makefiles are more stable. For example, this line |
| can be added into "linux/arch/x86/kernel/cpu/mcheck/Makefile" |
| |
| b. If you want to use gcov with kdump test driver, please install |
| gcovdump tool(see section 3.4). The latest gcovdump can be gotten |
| from cvs: |
| http://ltp.cvs.sourceforge.net/viewvc/ltp/utils/analysis/gcov-kdump/. |
| |
| c. Linux kernel source source code should be put on the test |
| machine. Its root directory should be specified in test |
| configuration file via KSRC_DIR. |
| |
| d. In addition to bzImage and modules of Linux kernel should be |
| installed on test machine, the vmlinux of Linux kernel should be |
| put on test machine and specified via VMLINUX of test configuration |
| file. |
| |
| e. Make sure gcov is available in your test system. It comes with gcc |
| package normally. If kdump test driver is used, a tool named |
| gcovdump is also needed to dump *.gcda from crash dump image. |
| |
| f. In test configuration file, make sure the following setting is |
| available: |
| |
| # enable GCOV support |
| GCOV=1 |
| # kernel source is needed to get gcov graph |
| KSRC_DIR=<kernel source directory> |
| VMLINUX=<vmlinux> |
| |
| g. After testing, *.c.gcov will be generated in test case result |
| directory, such as |
| results/kdump/soft-inj/non-panic/corrected/mce_64.c.gcov. |
| |
| h. To merge gcov graph data from several test cases, a tool named |
| gcov_merge.py in tools sub-directory can be used. For example, |
| |
| tools/gcov_merge results/kdump/soft-inj/*/*/mce_64.c.gcov |
| |
| Will output merged gcov graph from all test cases under |
| soft-inj. This can be used to check coverage of several test cases. |
| |
| 4.4 tool |
| |
| Some tools are provided to help analyze test result. |
| |
| - tools/grep_result.sh |
| |
| Grep from general test result (results/<driver_name>/result) in |
| terms of test case instead of line, because the result of one test |
| case may span several line. |
| |
| Usage: |
| cat results/<driver_name>/result | tools/grep_result.sh <grep options> |
| |
| Where <grep options> are same as options available to /bin/grep. |
| |
| - tools/loop-mce-test |
| |
| Run mce test cases in a loop. It exits on failure of any one of the test |
| cases. This script is using simple test driver. |
| |
| Usage: |
| ./loop-mce-test <config_file> |
| |
| Note that only simple test configure file can be used here. |
| |
| 5. Add test cases |
| ----------------- |
| |
| 5.1 Add test case to in-package test class |
| |
| All in-package test classes use mce-inject software injection tool and |
| follows same structure. The steps to add a new test case is as follow: |
| |
| a. Find an appropriate test case class to add your test case. |
| |
| b. Add a new mce-inject data file into to cases/soft-inj/<class name>/data/. |
| |
| c. If the reference mcelog is different from mce-inject input data |
| file, put that reference file into cases/soft-inj/<class_name>/refer/. |
| |
| d. In cases/soft-inj/<class name>/cases.sh, there are shell commands |
| "case" in shell functions get_result() and verify(). Add a branch |
| in each shell command "case" for your test case. |
| |
| 5.2 Add test class |
| |
| To add a new test class, add a cases.sh under a sub-directory of |
| cases/, and follow the test case class interface definition in section |
| 2.1. The general result output format should follow that in section |
| 2.4. |