blob: d79fb52790e6e1fd8163266231589d40cf1bfaad [file] [log] [blame]
KVM RAS Test Suite
==================
The KVM RAS Test Suite is a collection of test scripts for testing the
Linux kernel MCE processing features in KVM guest system.
Jan 26th, 2010
Jiajia Zheng
In the Package
----------------
Here is a short description of what is included in the package
host/*
Contains host test scripts, which drive test procedure on host system.
guest/*
Contains guest test scripts, which drive test procedure on guest system.
Dependencies
----------------
KVM RAS Test Suite has following dependencies on kernel and other tools:
* Linux Kernel:
Version 2.6.32 or newer, with MCA high level handlers enabled.
* mce-inject
A tool to inject mce error into kernel
* page-types:
A tool to query page types, which is accompanied with Linux kernel
source (2.6.32 or newer, $KERNEL_SRC/Documentation/vm/page-types.c).
* simple_process:
A process constantly access the allocated memeory. (../tools/simple_process)
* kpartx:
A tool to list partition mappings from partition tables
* (optionally) losetup
A tool to set up and control loop devices
* (optionally) pvdisplay/vgchange:
A tool to display/change physical volume attribute
* (optionally) ssh-keygen
A tool to provide the authentication key generation, management and conversion
Test method
---------------
- Start a process in the guest OS, get a virtual address from guest OS
- Translate this address untill we get the physical address on the host OS
- Software injects an SRAO MCE at that physical address from host OS
- (optionally) Write to the address from the guest, i.e attempt to
write to the poisoned page from the guest.
the expected result:
HOST system dmesg:
...
Machine check injector initialized
Triggering MCE exception on CPU 0
Disabling lock debugging due to kernel taint
[Hardware Error]: Machine check events logged
MCE exception done on CPU 0
MCE 0x806324: Killing qemu-system-x86:8829 early due to hardware memory corruption
MCE 0x806324: dirty LRU page recovery: Recovered
...
GUEST system dmesg:
...
[Hardware Error]: Machine check events logged
MCE 0x75925: Killing simple_process:2273 early due to hardware memory corruption
MCE 0x75925: dirty LRU page recovery : Recovered
...
Installation
---------------
1. Build *host* kernel with
CONFIG_KVM=y
CONFIG_KVM_INTEL=y
CONFIG_X86_MCE=y
CONFIG_X86_MCE_INTEL=y
CONFIG_X86_MCE_INJECT=y or CONFIG_X86_MCE_INJECT=m
and following config both on *host* and *guest*
CONFIG_ARCH_SUPPORTS_MEMORY_FAILURE=y
CONFIG_MEMORY_FAILURE=y
NOTE: if the host machine doesn't support software error recovery
(MCG_SER_P in IA32_MCG_CAP[24]), please apply the patch fake_ser_p.patch
under ./patches/
2. Use ssh-keygen to generate public and privite keys on the host OS,
and copy id_rsa and id_rsa.pub from ~/.ssh/ into the testing directory
on the host system.
3. compile and install qemu-kvm
the qemu-kvm source can be got from git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git.
Before compile qemu-kvm, a patch p2v.patch should be applied. This patch
is located under ./patches/
Please ensure *SDL-devel* library is installed on the host machine, otherwise
only VNC can be used substituing local graphic output.
4. install the guest OS on the qemu
e.g.
step 1: qemu-img create -f qcow2 test.img 10G
step 2: qemu-system-x86_64 -hda ./test.img -m 2048 -cdrom rhel6.iso -boot d
after the installation, please be sure to execute the following check:
a) add necessary command line parameters under your boot item.
This is to enable console output redirection to the serial.
e.g.
before:
title Red Hat Enterprise Linux Server (2.6.32kvm)
root (hd0,1)
kernel /boot/vmlinuz-2.6.32kvm ro root=/dev/sda1
initrd /boot/initramfs-2.6.32kvm.img
after:
title Red Hat Enterprise Linux Server (2.6.32kvm)
root (hd0,1)
kernel /boot/vmlinuz-2.6.32kvm ro root=/dev/sda1 console=tty0 console=ttyS0,115200n8
initrd /boot/initramfs-2.6.32kvm.img
b) DHCP guest ethernet card. This operation is to ensure the network connection
is OK.
e.g.
bash> dhclient eth0
c) enable SSH public/private key authorization, otherwise, the SSH connection password
is necessary to be provided in the test progress.
please check /etc/ssh/ssd_config and ensure related options are opened.
e.g.
RSAAuthentication yes
PubkeyAuthentication yes
after the related options are opened, please restart ssh service
e.g.
bash> service sshd restart
Start Testing
---------------
Run testing by
./host_run.sh <option> <argument>
You can get the help information by
./host_run.sh -h