releases/4.14.159/sched-fair-scale-bandwidth-quota-and-period-without-losing-quota-period-ratio-precision.patch - pub/scm/linux/kernel/git/stable/stable-queue - Git at Google

 From 4929a4e6faa0f13289a67cae98139e727f0d4a97 Mon Sep 17 00:00:00 2001
 From: Xuewei Zhang <xueweiz@google.com>
 Date: Thu, 3 Oct 2019 17:12:43 -0700
 Subject: sched/fair: Scale bandwidth quota and period without losing quota/period ratio precision

 From: Xuewei Zhang <xueweiz@google.com>

 commit 4929a4e6faa0f13289a67cae98139e727f0d4a97 upstream.

 The quota/period ratio is used to ensure a child task group won't get
 more bandwidth than the parent task group, and is calculated as:

   normalized_cfs_quota() = [(quota_us << 20) / period_us]

 If the quota/period ratio was changed during this scaling due to
 precision loss, it will cause inconsistency between parent and child
 task groups.

 See below example:

 A userspace container manager (kubelet) does three operations:

  1) Create a parent cgroup, set quota to 1,000us and period to 10,000us.
  2) Create a few children cgroups.
  3) Set quota to 1,000us and period to 10,000us on a child cgroup.

 These operations are expected to succeed. However, if the scaling of
 147/128 happens before step 3, quota and period of the parent cgroup
 will be changed:

   new_quota: 1148437ns,   1148us
  new_period: 11484375ns, 11484us

 And when step 3 comes in, the ratio of the child cgroup will be
 104857, which will be larger than the parent cgroup ratio (104821),
 and will fail.

 Scaling them by a factor of 2 will fix the problem.

 Tested-by: Phil Auld <pauld@redhat.com>
 Signed-off-by: Xuewei Zhang <xueweiz@google.com>
 Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
 Acked-by: Phil Auld <pauld@redhat.com>
 Cc: Anton Blanchard <anton@ozlabs.org>
 Cc: Ben Segall <bsegall@google.com>
 Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
 Cc: Juri Lelli <juri.lelli@redhat.com>
 Cc: Linus Torvalds <torvalds@linux-foundation.org>
 Cc: Mel Gorman <mgorman@suse.de>
 Cc: Peter Zijlstra <peterz@infradead.org>
 Cc: Steven Rostedt <rostedt@goodmis.org>
 Cc: Thomas Gleixner <tglx@linutronix.de>
 Cc: Vincent Guittot <vincent.guittot@linaro.org>
 Fixes: 2e8e19226398 ("sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup")
 Link: https://lkml.kernel.org/r/20191004001243.140897-1-xueweiz@google.com
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


 ---
  kernel/sched/fair.c |   34 +++++++++++++++++++++-------------
  1 file changed, 21 insertions(+), 13 deletions(-)

 --- a/kernel/sched/fair.c
 +++ b/kernel/sched/fair.c
 @@ -4655,20 +4655,28 @@ static enum hrtimer_restart sched_cfs_pe
  		if (++count > 3) {
  			u64 new, old = ktime_to_ns(cfs_b->period);

 -			new = (old * 147) / 128; /* ~115% */
 -			new = min(new, max_cfs_quota_period);
 +			/*
 +			 * Grow period by a factor of 2 to avoid losing precision.
 +			 * Precision loss in the quota/period ratio can cause __cfs_schedulable
 +			 * to fail.
 +			 */
 +			new = old * 2;
 +			if (new < max_cfs_quota_period) {
 +				cfs_b->period = ns_to_ktime(new);
 +				cfs_b->quota *= 2;

 -			cfs_b->period = ns_to_ktime(new);
 -
 -			/* since max is 1s, this is limited to 1e9^2, which fits in u64 */
 -			cfs_b->quota *= new;
 -			cfs_b->quota = div64_u64(cfs_b->quota, old);
 -
 -			pr_warn_ratelimited(
 -        "cfs_period_timer[cpu%d]: period too short, scaling up (new cfs_period_us %lld, cfs_quota_us = %lld)\n",
 -	                        smp_processor_id(),
 -	                        div_u64(new, NSEC_PER_USEC),
 -                                div_u64(cfs_b->quota, NSEC_PER_USEC));
 +				pr_warn_ratelimited(
 +	"cfs_period_timer[cpu%d]: period too short, scaling up (new cfs_period_us = %lld, cfs_quota_us = %lld)\n",
 +					smp_processor_id(),
 +					div_u64(new, NSEC_PER_USEC),
 +					div_u64(cfs_b->quota, NSEC_PER_USEC));
 +			} else {
 +				pr_warn_ratelimited(
 +	"cfs_period_timer[cpu%d]: period too short, but cannot scale up without losing precision (cfs_period_us = %lld, cfs_quota_us = %lld)\n",
 +					smp_processor_id(),
 +					div_u64(old, NSEC_PER_USEC),
 +					div_u64(cfs_b->quota, NSEC_PER_USEC));
 +			}

  			/* reset count so we don't come right back in here */
  			count = 0;
	From 4929a4e6faa0f13289a67cae98139e727f0d4a97 Mon Sep 17 00:00:00 2001
	From: Xuewei Zhang <xueweiz@google.com>
	Date: Thu, 3 Oct 2019 17:12:43 -0700
	Subject: sched/fair: Scale bandwidth quota and period without losing quota/period ratio precision

	From: Xuewei Zhang <xueweiz@google.com>

	commit 4929a4e6faa0f13289a67cae98139e727f0d4a97 upstream.

	The quota/period ratio is used to ensure a child task group won't get
	more bandwidth than the parent task group, and is calculated as:

	normalized_cfs_quota() = [(quota_us << 20) / period_us]

	If the quota/period ratio was changed during this scaling due to
	precision loss, it will cause inconsistency between parent and child
	task groups.

	See below example:

	A userspace container manager (kubelet) does three operations:

	1) Create a parent cgroup, set quota to 1,000us and period to 10,000us.
	2) Create a few children cgroups.
	3) Set quota to 1,000us and period to 10,000us on a child cgroup.

	These operations are expected to succeed. However, if the scaling of
	147/128 happens before step 3, quota and period of the parent cgroup
	will be changed:

	new_quota: 1148437ns, 1148us
	new_period: 11484375ns, 11484us

	And when step 3 comes in, the ratio of the child cgroup will be
	104857, which will be larger than the parent cgroup ratio (104821),
	and will fail.

	Scaling them by a factor of 2 will fix the problem.

	Tested-by: Phil Auld <pauld@redhat.com>
	Signed-off-by: Xuewei Zhang <xueweiz@google.com>
	Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
	Acked-by: Phil Auld <pauld@redhat.com>
	Cc: Anton Blanchard <anton@ozlabs.org>
	Cc: Ben Segall <bsegall@google.com>
	Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
	Cc: Juri Lelli <juri.lelli@redhat.com>
	Cc: Linus Torvalds <torvalds@linux-foundation.org>
	Cc: Mel Gorman <mgorman@suse.de>
	Cc: Peter Zijlstra <peterz@infradead.org>
	Cc: Steven Rostedt <rostedt@goodmis.org>
	Cc: Thomas Gleixner <tglx@linutronix.de>
	Cc: Vincent Guittot <vincent.guittot@linaro.org>
	Fixes: 2e8e19226398 ("sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup")
	Link: https://lkml.kernel.org/r/20191004001243.140897-1-xueweiz@google.com
	Signed-off-by: Ingo Molnar <mingo@kernel.org>
	Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


	---
	kernel/sched/fair.c \| 34 +++++++++++++++++++++-------------
	1 file changed, 21 insertions(+), 13 deletions(-)

	--- a/kernel/sched/fair.c
	+++ b/kernel/sched/fair.c
	@@ -4655,20 +4655,28 @@ static enum hrtimer_restart sched_cfs_pe
	if (++count > 3) {
	u64 new, old = ktime_to_ns(cfs_b->period);

	- new = (old * 147) / 128; /* ~115% */
	- new = min(new, max_cfs_quota_period);
	+ /*
	+ * Grow period by a factor of 2 to avoid losing precision.
	+ * Precision loss in the quota/period ratio can cause __cfs_schedulable
	+ * to fail.
	+ */
	+ new = old * 2;
	+ if (new < max_cfs_quota_period) {
	+ cfs_b->period = ns_to_ktime(new);
	+ cfs_b->quota *= 2;

	- cfs_b->period = ns_to_ktime(new);
	-
	- /* since max is 1s, this is limited to 1e9^2, which fits in u64 */
	- cfs_b->quota *= new;
	- cfs_b->quota = div64_u64(cfs_b->quota, old);
	-
	- pr_warn_ratelimited(
	- "cfs_period_timer[cpu%d]: period too short, scaling up (new cfs_period_us %lld, cfs_quota_us = %lld)\n",
	- smp_processor_id(),
	- div_u64(new, NSEC_PER_USEC),
	- div_u64(cfs_b->quota, NSEC_PER_USEC));
	+ pr_warn_ratelimited(
	+ "cfs_period_timer[cpu%d]: period too short, scaling up (new cfs_period_us = %lld, cfs_quota_us = %lld)\n",
	+ smp_processor_id(),
	+ div_u64(new, NSEC_PER_USEC),
	+ div_u64(cfs_b->quota, NSEC_PER_USEC));
	+ } else {
	+ pr_warn_ratelimited(
	+ "cfs_period_timer[cpu%d]: period too short, but cannot scale up without losing precision (cfs_period_us = %lld, cfs_quota_us = %lld)\n",
	+ smp_processor_id(),
	+ div_u64(old, NSEC_PER_USEC),
	+ div_u64(cfs_b->quota, NSEC_PER_USEC));
	+ }

	/* reset count so we don't come right back in here */
	count = 0;