| From 7099e2e1f4d9051f31bbfa5803adf954bb5d76ef Mon Sep 17 00:00:00 2001 |
| From: =?UTF-8?q?Radim=20Kr=C4=8Dm=C3=A1=C5=99?= <rkrcmar@redhat.com> |
| Date: Fri, 4 Mar 2016 15:08:42 +0100 |
| Subject: KVM: VMX: disable PEBS before a guest entry |
| MIME-Version: 1.0 |
| Content-Type: text/plain; charset=UTF-8 |
| Content-Transfer-Encoding: 8bit |
| |
| From: Radim Krčmář <rkrcmar@redhat.com> |
| |
| commit 7099e2e1f4d9051f31bbfa5803adf954bb5d76ef upstream. |
| |
| Linux guests on Haswell (and also SandyBridge and Broadwell, at least) |
| would crash if you decided to run a host command that uses PEBS, like |
| perf record -e 'cpu/mem-stores/pp' -a |
| |
| This happens because KVM is using VMX MSR switching to disable PEBS, but |
| SDM [2015-12] 18.4.4.4 Re-configuring PEBS Facilities explains why it |
| isn't safe: |
| When software needs to reconfigure PEBS facilities, it should allow a |
| quiescent period between stopping the prior event counting and setting |
| up a new PEBS event. The quiescent period is to allow any latent |
| residual PEBS records to complete its capture at their previously |
| specified buffer address (provided by IA32_DS_AREA). |
| |
| There might not be a quiescent period after the MSR switch, so a CPU |
| ends up using host's MSR_IA32_DS_AREA to access an area in guest's |
| memory. (Or MSR switching is just buggy on some models.) |
| |
| The guest can learn something about the host this way: |
| If the guest doesn't map address pointed by MSR_IA32_DS_AREA, it results |
| in #PF where we leak host's MSR_IA32_DS_AREA through CR2. |
| |
| After that, a malicious guest can map and configure memory where |
| MSR_IA32_DS_AREA is pointing and can therefore get an output from |
| host's tracing. |
| |
| This is not a critical leak as the host must initiate with PEBS tracing |
| and I have not been able to get a record from more than one instruction |
| before vmentry in vmx_vcpu_run() (that place has most registers already |
| overwritten with guest's). |
| |
| We could disable PEBS just few instructions before vmentry, but |
| disabling it earlier shouldn't affect host tracing too much. |
| We also don't need to switch MSR_IA32_PEBS_ENABLE on VMENTRY, but that |
| optimization isn't worth its code, IMO. |
| |
| (If you are implementing PEBS for guests, be sure to handle the case |
| where both host and guest enable PEBS, because this patch doesn't.) |
| |
| Fixes: 26a4f3c08de4 ("perf/x86: disable PEBS on a guest entry.") |
| Reported-by: Jiří Olša <jolsa@redhat.com> |
| Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> |
| Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| |
| --- |
| arch/x86/kvm/vmx.c | 7 +++++++ |
| 1 file changed, 7 insertions(+) |
| |
| --- a/arch/x86/kvm/vmx.c |
| +++ b/arch/x86/kvm/vmx.c |
| @@ -1748,6 +1748,13 @@ static void add_atomic_switch_msr(struct |
| return; |
| } |
| break; |
| + case MSR_IA32_PEBS_ENABLE: |
| + /* PEBS needs a quiescent period after being disabled (to write |
| + * a record). Disabling PEBS through VMX MSR swapping doesn't |
| + * provide that period, so a CPU could write host's record into |
| + * guest's memory. |
| + */ |
| + wrmsrl(MSR_IA32_PEBS_ENABLE, 0); |
| } |
| |
| for (i = 0; i < m->nr; ++i) |