srcu: Fix broken node geometry after early ssp init
An ssp initialized before rcu_init_geometry() will have its snp hierarchy
based on CONFIG_NR_CPUS.
Once rcu_init_geometry() is called, the nodes distribution is shrinked
and optimized toward meeting the actual possible number of CPUs detected
on boot.
Later on, the ssp that was initialized before rcu_init_geometry() is
confused and sometimes refers to its initial CONFIG_NR_CPUS based node
hierarchy, sometimes to the new num_possible_cpus() based one instead.
For example each of its sdp->mynode remain backward and refer to the
early node leaves that may not exist anymore. On the other hand the
srcu_for_each_node_breadth_first() refers to the new node hierarchy.
There are at least two bad possible outcomes to this:
1) a) A callback enqueued early on an sdp is recorded pending on
sdp->mynode->srcu_data_have_cbs in srcu_funnel_gp_start() with
sdp->mynode pointing to a deep leaf (say 3 levels).
b) The grace period ends after rcu_init_geometry() which shrinks the
nodes level to a single one. srcu_gp_end() walks through the new
snp hierarchy without ever reaching the old leaves so the callback
is never executed.
This is easily reproduced on an 8 CPUs machine with
CONFIG_NR_CPUS >= 32 and "rcupdate.rcu_self_test=1". The
srcu_barrier() after early tests verification never completes and
the boot hangs:
[ 5413.141029] INFO: task swapper/0:1 blocked for more than 4915 seconds.
[ 5413.147564] Not tainted 5.12.0-rc4+ #28
[ 5413.151927] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 5413.159753] task:swapper/0 state:D stack: 0 pid: 1 ppid: 0 flags:0x00004000
[ 5413.168099] Call Trace:
[ 5413.170555] __schedule+0x36c/0x930
[ 5413.174057] ? wait_for_completion+0x88/0x110
[ 5413.178423] schedule+0x46/0xf0
[ 5413.181575] schedule_timeout+0x284/0x380
[ 5413.185591] ? wait_for_completion+0x88/0x110
[ 5413.189957] ? mark_held_locks+0x61/0x80
[ 5413.193882] ? mark_held_locks+0x61/0x80
[ 5413.197809] ? _raw_spin_unlock_irq+0x24/0x50
[ 5413.202173] ? wait_for_completion+0x88/0x110
[ 5413.206535] wait_for_completion+0xb4/0x110
[ 5413.210724] ? srcu_torture_stats_print+0x110/0x110
[ 5413.215610] srcu_barrier+0x187/0x200
[ 5413.219277] ? rcu_tasks_verify_self_tests+0x50/0x50
[ 5413.224244] ? rdinit_setup+0x2b/0x2b
[ 5413.227907] rcu_verify_early_boot_tests+0x2d/0x40
[ 5413.232700] do_one_initcall+0x63/0x310
[ 5413.236541] ? rdinit_setup+0x2b/0x2b
[ 5413.240207] ? rcu_read_lock_sched_held+0x52/0x80
[ 5413.244912] kernel_init_freeable+0x253/0x28f
[ 5413.249273] ? rest_init+0x250/0x250
[ 5413.252846] kernel_init+0xa/0x110
[ 5413.256257] ret_from_fork+0x22/0x30
2) An ssp that gets initialized before rcu_init_geometry() and used
afterward will always have stale rdp->mynode references, resulting in
callbacks to be missed in srcu_gp_end(), just like in the previous
scenario.
Solve this with calling rcu_init_geometry() whenever an struct srcu_state
happens to be initialized before rcu_init(). This way we make sure the
RCU nodes hierarchy is properly built and distributed before the nodes
of a struct srcu_state are allocated.
Suggested-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Uladzislau Rezki <urezki@gmail.com>
3 files changed