releases/5.0.14/kvm-lapic-disable-timer-advancement-if-adaptive-tuning-goes-haywire.patch - pub/scm/linux/kernel/git/stable/stable-queue - Git at Google

 From 57bf67e73ce9bcce2258890f5abf2adf5f619f1a Mon Sep 17 00:00:00 2001
 From: Sean Christopherson <sean.j.christopherson@intel.com>
 Date: Wed, 17 Apr 2019 10:15:31 -0700
 Subject: KVM: lapic: Disable timer advancement if adaptive tuning goes haywire

 From: Sean Christopherson <sean.j.christopherson@intel.com>

 commit 57bf67e73ce9bcce2258890f5abf2adf5f619f1a upstream.

 To minimize the latency of timer interrupts as observed by the guest,
 KVM adjusts the values it programs into the host timers to account for
 the host's overhead of programming and handling the timer event.  Now
 that the timer advancement is automatically tuned during runtime, it's
 effectively unbounded by default, e.g. if KVM is running as L1 the
 advancement can measure in hundreds of milliseconds.

 Disable timer advancement if adaptive tuning yields an advancement of
 more than 5000ns, as large advancements can break reasonable assumptions
 of the guest, e.g. that a timer configured to fire after 1ms won't
 arrive on the next instruction.  Although KVM busy waits to mitigate the
 case of a timer event arriving too early, complications can arise when
 shifting the interrupt too far, e.g. kvm-unit-test's vmx.interrupt test
 will fail when its "host" exits on interrupts as KVM may inject the INTR
 before the guest executes STI+HLT.   Arguably the unit test is "broken"
 in the sense that delaying a timer interrupt by 1ms doesn't technically
 guarantee the interrupt will arrive after STI+HLT, but it's a reasonable
 assumption that KVM should support.

 Furthermore, an unbounded advancement also effectively unbounds the time
 spent busy waiting, e.g. if the guest programs a timer with a very large
 delay.

 5000ns is a somewhat arbitrary threshold.  When running on bare metal,
 which is the intended use case, timer advancement is expected to be in
 the general vicinity of 1000ns.  5000ns is high enough that false
 positives are unlikely, while not being so high as to negatively affect
 the host's performance/stability.

 Note, a future patch will enable userspace to disable KVM's adaptive
 tuning, which will allow priveleged userspace will to specifying an
 advancement value in excess of this arbitrary threshold in order to
 satisfy an abnormal use case.

 Cc: Liran Alon <liran.alon@oracle.com>
 Cc: Wanpeng Li <wanpengli@tencent.com>
 Cc: stable@vger.kernel.org
 Fixes: 3b8a5df6c4dc6 ("KVM: LAPIC: Tune lapic_timer_advance_ns automatically")
 Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

 ---
  arch/x86/kvm/lapic.c |    4 ++++
  1 file changed, 4 insertions(+)

 --- a/arch/x86/kvm/lapic.c
 +++ b/arch/x86/kvm/lapic.c
 @@ -1519,6 +1519,10 @@ void wait_lapic_expire(struct kvm_vcpu *
  		}
  		if (abs(guest_tsc - tsc_deadline) < LAPIC_TIMER_ADVANCE_ADJUST_DONE)
  			lapic_timer_advance_adjust_done = true;
 +		if (unlikely(lapic_timer_advance_ns > 5000)) {
 +			lapic_timer_advance_ns = 0;
 +			lapic_timer_advance_adjust_done = true;
 +		}
  	}
  }
	From 57bf67e73ce9bcce2258890f5abf2adf5f619f1a Mon Sep 17 00:00:00 2001
	From: Sean Christopherson <sean.j.christopherson@intel.com>
	Date: Wed, 17 Apr 2019 10:15:31 -0700
	Subject: KVM: lapic: Disable timer advancement if adaptive tuning goes haywire

	From: Sean Christopherson <sean.j.christopherson@intel.com>

	commit 57bf67e73ce9bcce2258890f5abf2adf5f619f1a upstream.

	To minimize the latency of timer interrupts as observed by the guest,
	KVM adjusts the values it programs into the host timers to account for
	the host's overhead of programming and handling the timer event. Now
	that the timer advancement is automatically tuned during runtime, it's
	effectively unbounded by default, e.g. if KVM is running as L1 the
	advancement can measure in hundreds of milliseconds.

	Disable timer advancement if adaptive tuning yields an advancement of
	more than 5000ns, as large advancements can break reasonable assumptions
	of the guest, e.g. that a timer configured to fire after 1ms won't
	arrive on the next instruction. Although KVM busy waits to mitigate the
	case of a timer event arriving too early, complications can arise when
	shifting the interrupt too far, e.g. kvm-unit-test's vmx.interrupt test
	will fail when its "host" exits on interrupts as KVM may inject the INTR
	before the guest executes STI+HLT. Arguably the unit test is "broken"
	in the sense that delaying a timer interrupt by 1ms doesn't technically
	guarantee the interrupt will arrive after STI+HLT, but it's a reasonable
	assumption that KVM should support.

	Furthermore, an unbounded advancement also effectively unbounds the time
	spent busy waiting, e.g. if the guest programs a timer with a very large
	delay.

	5000ns is a somewhat arbitrary threshold. When running on bare metal,
	which is the intended use case, timer advancement is expected to be in
	the general vicinity of 1000ns. 5000ns is high enough that false
	positives are unlikely, while not being so high as to negatively affect
	the host's performance/stability.

	Note, a future patch will enable userspace to disable KVM's adaptive
	tuning, which will allow priveleged userspace will to specifying an
	advancement value in excess of this arbitrary threshold in order to
	satisfy an abnormal use case.

	Cc: Liran Alon <liran.alon@oracle.com>
	Cc: Wanpeng Li <wanpengli@tencent.com>
	Cc: stable@vger.kernel.org
	Fixes: 3b8a5df6c4dc6 ("KVM: LAPIC: Tune lapic_timer_advance_ns automatically")
	Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
	Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
	Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

	---
	arch/x86/kvm/lapic.c \| 4 ++++
	1 file changed, 4 insertions(+)

	--- a/arch/x86/kvm/lapic.c
	+++ b/arch/x86/kvm/lapic.c
	@@ -1519,6 +1519,10 @@ void wait_lapic_expire(struct kvm_vcpu *
	}
	if (abs(guest_tsc - tsc_deadline) < LAPIC_TIMER_ADVANCE_ADJUST_DONE)
	lapic_timer_advance_adjust_done = true;
	+ if (unlikely(lapic_timer_advance_ns > 5000)) {
	+ lapic_timer_advance_ns = 0;
	+ lapic_timer_advance_adjust_done = true;
	+ }
	}
	}