| From 57bf67e73ce9bcce2258890f5abf2adf5f619f1a Mon Sep 17 00:00:00 2001 |
| From: Sean Christopherson <sean.j.christopherson@intel.com> |
| Date: Wed, 17 Apr 2019 10:15:31 -0700 |
| Subject: KVM: lapic: Disable timer advancement if adaptive tuning goes haywire |
| |
| From: Sean Christopherson <sean.j.christopherson@intel.com> |
| |
| commit 57bf67e73ce9bcce2258890f5abf2adf5f619f1a upstream. |
| |
| To minimize the latency of timer interrupts as observed by the guest, |
| KVM adjusts the values it programs into the host timers to account for |
| the host's overhead of programming and handling the timer event. Now |
| that the timer advancement is automatically tuned during runtime, it's |
| effectively unbounded by default, e.g. if KVM is running as L1 the |
| advancement can measure in hundreds of milliseconds. |
| |
| Disable timer advancement if adaptive tuning yields an advancement of |
| more than 5000ns, as large advancements can break reasonable assumptions |
| of the guest, e.g. that a timer configured to fire after 1ms won't |
| arrive on the next instruction. Although KVM busy waits to mitigate the |
| case of a timer event arriving too early, complications can arise when |
| shifting the interrupt too far, e.g. kvm-unit-test's vmx.interrupt test |
| will fail when its "host" exits on interrupts as KVM may inject the INTR |
| before the guest executes STI+HLT. Arguably the unit test is "broken" |
| in the sense that delaying a timer interrupt by 1ms doesn't technically |
| guarantee the interrupt will arrive after STI+HLT, but it's a reasonable |
| assumption that KVM should support. |
| |
| Furthermore, an unbounded advancement also effectively unbounds the time |
| spent busy waiting, e.g. if the guest programs a timer with a very large |
| delay. |
| |
| 5000ns is a somewhat arbitrary threshold. When running on bare metal, |
| which is the intended use case, timer advancement is expected to be in |
| the general vicinity of 1000ns. 5000ns is high enough that false |
| positives are unlikely, while not being so high as to negatively affect |
| the host's performance/stability. |
| |
| Note, a future patch will enable userspace to disable KVM's adaptive |
| tuning, which will allow priveleged userspace will to specifying an |
| advancement value in excess of this arbitrary threshold in order to |
| satisfy an abnormal use case. |
| |
| Cc: Liran Alon <liran.alon@oracle.com> |
| Cc: Wanpeng Li <wanpengli@tencent.com> |
| Cc: stable@vger.kernel.org |
| Fixes: 3b8a5df6c4dc6 ("KVM: LAPIC: Tune lapic_timer_advance_ns automatically") |
| Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> |
| Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| |
| --- |
| arch/x86/kvm/lapic.c | 4 ++++ |
| 1 file changed, 4 insertions(+) |
| |
| --- a/arch/x86/kvm/lapic.c |
| +++ b/arch/x86/kvm/lapic.c |
| @@ -1519,6 +1519,10 @@ void wait_lapic_expire(struct kvm_vcpu * |
| } |
| if (abs(guest_tsc - tsc_deadline) < LAPIC_TIMER_ADVANCE_ADJUST_DONE) |
| lapic_timer_advance_adjust_done = true; |
| + if (unlikely(lapic_timer_advance_ns > 5000)) { |
| + lapic_timer_advance_ns = 0; |
| + lapic_timer_advance_adjust_done = true; |
| + } |
| } |
| } |
| |