refs/heads/sched/flat - pub/scm/linux/kernel/git/peterz/queue.git

commit	eaad49852269b521e02cff73adc8b8a432fda88e	[log] [tgz]
author	Peter Zijlstra <peterz@infradead.org>	Wed Sep 03 15:45:31 2025 +0200
committer	Peter Zijlstra <peterz@infradead.org>	Sat Nov 29 12:26:53 2025 +0100
tree	4c294a46297459eb9a07adc224171a9367a295d5
parent	cc89ae1c64ca70845da37d90357eb4263b900f1a [diff]

sched/eevdf: Move to a single runqueue Change fair/cgroup to a single runqueue. Infamously fair/cgroup isn't working for a number of people; typically the complaint is latencies and/or overhead. The latency issue is due to the intermediate entries that represent a combination of tasks and thereby obfuscate the runnability of tasks. The approach here is to leave the cgroup hierarchy as is; including the intermediate enqueue/dequeue but move the actual EEVDF runqueue outside. This means things like the shares_weight approximation are fully preserved. That is, given a hierarchy like: R | se--G1 / \ G2--se se--G3 / \ | T1--se se--T2 se--T3 This is fully maintained for load tracking, however the EEVDF parts of cfs_rq/se go unused for the intermediates and are instead connected like: _R_ / | \ T1 T2 T3 Since the effective weight of the entities is determined by the hierarchy, this gets recomputed on enqueue,set_next_task and tick. Notably, the effective weight (se->h_load) is computed from the hierarchical fraction: se->load / cfs_rq->load. Since EEVDF is now exclusive operating on rq->cfs, it needs to consider cfs_rq->h_nr_queued rather than cfs_rq->nr_queued. Similarly, only tasks can get delayed, simplifying some of the cgroup cleanup. One place where additional information was required was set_next_task() / put_prev_task(), where we need to track 'current' both in the hierarchical sense (cfs_rq->h_curr) and in the flat sense (cfs_rq->curr). As a result of only having a single level to pick from, much of the complications in pick_next_task() and preemption go away. Since many of the hierarchical operations are still there, this won't immediately fix the performance issues, but hopefully it will fix some of the latency issues. TODO: split struct cfs_rq / struct sched_entity TODO: try and get rid of h_curr Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>