releases/5.10.50/sched-rt-fix-rt-utilization-tracking-during-policy-c.patch - pub/scm/linux/kernel/git/stable/stable-queue - Git at Google

 From f405c3ed226ee72b59b58b7285472fc17d83ecaf Mon Sep 17 00:00:00 2001
 From: Sasha Levin <sashal@kernel.org>
 Date: Mon, 21 Jun 2021 11:37:51 +0100
 Subject: sched/rt: Fix RT utilization tracking during policy change

 From: Vincent Donnefort <vincent.donnefort@arm.com>

 [ Upstream commit fecfcbc288e9f4923f40fd23ca78a6acdc7fdf6c ]

 RT keeps track of the utilization on a per-rq basis with the structure
 avg_rt. This utilization is updated during task_tick_rt(),
 put_prev_task_rt() and set_next_task_rt(). However, when the current
 running task changes its policy, set_next_task_rt() which would usually
 take care of updating the utilization when the rq starts running RT tasks,
 will not see a such change, leaving the avg_rt structure outdated. When
 that very same task will be dequeued later, put_prev_task_rt() will then
 update the utilization, based on a wrong last_update_time, leading to a
 huge spike in the RT utilization signal.

 The signal would eventually recover from this issue after few ms. Even if
 no RT tasks are run, avg_rt is also updated in __update_blocked_others().
 But as the CPU capacity depends partly on the avg_rt, this issue has
 nonetheless a significant impact on the scheduler.

 Fix this issue by ensuring a load update when a running task changes
 its policy to RT.

 Fixes: 371bf427 ("sched/rt: Add rt_rq utilization tracking")
 Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
 Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
 Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
 Link: https://lore.kernel.org/r/1624271872-211872-2-git-send-email-vincent.donnefort@arm.com
 Signed-off-by: Sasha Levin <sashal@kernel.org>
 ---
  kernel/sched/rt.c | 17 ++++++++++++-----
  1 file changed, 12 insertions(+), 5 deletions(-)

 diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
 index 49ec096a8aa1..b5cf418e2e3f 100644
 --- a/kernel/sched/rt.c
 +++ b/kernel/sched/rt.c
 @@ -2291,13 +2291,20 @@ void __init init_sched_rt_class(void)
  static void switched_to_rt(struct rq *rq, struct task_struct *p)
  {
  	/*
 -	 * If we are already running, then there's nothing
 -	 * that needs to be done. But if we are not running
 -	 * we may need to preempt the current running task.
 -	 * If that current running task is also an RT task
 +	 * If we are running, update the avg_rt tracking, as the running time
 +	 * will now on be accounted into the latter.
 +	 */
 +	if (task_current(rq, p)) {
 +		update_rt_rq_load_avg(rq_clock_pelt(rq), rq, 0);
 +		return;
 +	}
 +
 +	/*
 +	 * If we are not running we may need to preempt the current
 +	 * running task. If that current running task is also an RT task
  	 * then see if we can move to another run queue.
  	 */
 -	if (task_on_rq_queued(p) && rq->curr != p) {
 +	if (task_on_rq_queued(p)) {
  #ifdef CONFIG_SMP
  		if (p->nr_cpus_allowed > 1 && rq->rt.overloaded)
  			rt_queue_push_tasks(rq);
 --
 2.30.2
	From f405c3ed226ee72b59b58b7285472fc17d83ecaf Mon Sep 17 00:00:00 2001
	From: Sasha Levin <sashal@kernel.org>
	Date: Mon, 21 Jun 2021 11:37:51 +0100
	Subject: sched/rt: Fix RT utilization tracking during policy change

	From: Vincent Donnefort <vincent.donnefort@arm.com>

	[ Upstream commit fecfcbc288e9f4923f40fd23ca78a6acdc7fdf6c ]

	RT keeps track of the utilization on a per-rq basis with the structure
	avg_rt. This utilization is updated during task_tick_rt(),
	put_prev_task_rt() and set_next_task_rt(). However, when the current
	running task changes its policy, set_next_task_rt() which would usually
	take care of updating the utilization when the rq starts running RT tasks,
	will not see a such change, leaving the avg_rt structure outdated. When
	that very same task will be dequeued later, put_prev_task_rt() will then
	update the utilization, based on a wrong last_update_time, leading to a
	huge spike in the RT utilization signal.

	The signal would eventually recover from this issue after few ms. Even if
	no RT tasks are run, avg_rt is also updated in __update_blocked_others().
	But as the CPU capacity depends partly on the avg_rt, this issue has
	nonetheless a significant impact on the scheduler.

	Fix this issue by ensuring a load update when a running task changes
	its policy to RT.

	Fixes: 371bf427 ("sched/rt: Add rt_rq utilization tracking")
	Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
	Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
	Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
	Link: https://lore.kernel.org/r/1624271872-211872-2-git-send-email-vincent.donnefort@arm.com
	Signed-off-by: Sasha Levin <sashal@kernel.org>
	---
	kernel/sched/rt.c \| 17 ++++++++++++-----
	1 file changed, 12 insertions(+), 5 deletions(-)

	diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
	index 49ec096a8aa1..b5cf418e2e3f 100644
	--- a/kernel/sched/rt.c
	+++ b/kernel/sched/rt.c
	@@ -2291,13 +2291,20 @@ void __init init_sched_rt_class(void)
	static void switched_to_rt(struct rq rq, struct task_struct p)
	{
	/*
	- * If we are already running, then there's nothing
	- * that needs to be done. But if we are not running
	- * we may need to preempt the current running task.
	- * If that current running task is also an RT task
	+ * If we are running, update the avg_rt tracking, as the running time
	+ * will now on be accounted into the latter.
	+ */
	+ if (task_current(rq, p)) {
	+ update_rt_rq_load_avg(rq_clock_pelt(rq), rq, 0);
	+ return;
	+ }
	+
	+ /*
	+ * If we are not running we may need to preempt the current
	+ * running task. If that current running task is also an RT task
	* then see if we can move to another run queue.
	*/
	- if (task_on_rq_queued(p) && rq->curr != p) {
	+ if (task_on_rq_queued(p)) {
	#ifdef CONFIG_SMP
	if (p->nr_cpus_allowed > 1 && rq->rt.overloaded)
	rt_queue_push_tasks(rq);
	--
	2.30.2