doc: Set down forward-progress requirements This commit adds a section to the requirements documentation setting down requirements for grace-period and callback-invocation forward progress. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
diff --git a/Documentation/RCU/Design/Requirements/Requirements.html b/Documentation/RCU/Design/Requirements/Requirements.html index 43c4e2f..7efc1c1 100644 --- a/Documentation/RCU/Design/Requirements/Requirements.html +++ b/Documentation/RCU/Design/Requirements/Requirements.html
@@ -1381,6 +1381,7 @@ <ol> <li> <a href="#Specialization">Specialization</a> <li> <a href="#Performance and Scalability">Performance and Scalability</a> +<li> <a href="#Forward Progress">Forward Progress</a> <li> <a href="#Composability">Composability</a> <li> <a href="#Corner Cases">Corner Cases</a> </ol> @@ -1822,6 +1823,106 @@ RCU thus provides a range of tools to allow updaters to strike the required tradeoff between latency, flexibility and CPU overhead. +<h3><a name="Forward Progress">Forward Progress</a></h3> + +<p> +In theory, delaying grace-period completion and callback invocation +is harmless. +In practice, not only are memory sizes finite but also callbacks sometimes +do wakeups, and sufficiently deferred wakeups can be difficult +to distinguish from system hangs. +Therefore, RCU must provide a number of mechanisms to promote forward +progress. + +<p> +These mechanisms are not foolproof, nor can they be. +For one simple example, an infinite loop in an RCU read-side critical +section must by definition prevent later grace periods from ever completing. +For a more involved example, consider a 64-CPU system built with +<tt>CONFIG_RCU_NOCB_CPU=y</tt> and booted with <tt>rcu_nocbs=1-63</tt>, +where CPUs 1 through 63 spin in tight loops that invoke +<tt>call_rcu()</tt>. +Even if these tight loops also contain calls to <tt>cond_resched()</tt> +(thus allowing grace periods to complete), CPU 0 simply will +not be able to invoke callbacks as fast as the other 63 CPUs can +register them, at least not until the system runs out of memory. +In both of these examples, the Spiderman principle applies: With great +power comes great responsibility. +However, short of this level of abuse, RCU is required to +ensure timely completion of grace periods and timely invocation of +callbacks. + +<p> +RCU takes the following steps to encourage timely completion of +grace periods: + +<ol> +<li> If a grace period fails to complete within 100 milliseconds, + RCU causes future invocations of <tt>cond_resched()</tt> on + the holdout CPUs to provide an RCU quiescent state. + RCU also causes those CPUs' <tt>need_resched()</tt> invocations + to return <tt>true</tt>, but only after the corresponding CPU's + next scheduling-clock. +<li> CPUs mentioned in the <tt>nohz_full</tt> kernel boot parameter + can run indefinitely in the kernel without scheduling-clock + interrupts, which defeats the above <tt>need_resched()</tt> + strategem. + RCU will therefore invoke <tt>resched_cpu()</tt> on any + <tt>nohz_full</tt> CPUs still holding out after + 109 milliseconds. +<li> In kernels built with <tt>CONFIG_RCU_BOOST=y</tt>, if a given + task that has been preempted within an RCU read-side critical + section is holding out for more than 500 milliseconds, + RCU will resort to priority boosting. +<li> If a CPU is still holding out 10 seconds into the grace + period, RCU will invoke <tt>resched_cpu()</tt> on it regardless + of its <tt>nohz_full</tt> state. +</ol> + +<p> +The above values are defaults for systems running with <tt>HZ=1000</tt>. +They will vary as the value of <tt>HZ</tt> varies, and can also be +changed using the relevant Kconfig options and kernel boot parameters. +RCU currently does not do much sanity checking of these +parameters, so please use caution when changing them. +Note that these forward-progress measures are provided only for RCU, +not for +<a href="#Sleepable RCU">SRCU</a> or +<a href="#Tasks RCU">Tasks RCU</a>. + +<p> +RCU takes the following steps in <tt>call_rcu()</tt> to encourage timely +invocation of callbacks when any given non-<tt>rcu_nocbs</tt> CPU has +10,000 callbacks, or has 10,000 more callbacks than it had the last time +encouragement was provided: + +<ol> +<li> Starts a grace period, if one is not already in progress. +<li> Forces immediate checking for quiescent states, rather than + waiting for three milliseconds to have elapsed since the + beginning of the grace period. +<li> Immediately tags the CPU's callbacks with their grace period + completion numbers, rather than waiting for the <tt>RCU_SOFTIRQ</tt> + handler to get around to it. +<li> Lifts callback-execution batch limits, which speeds up callback + invocation at the expense of degrading realtime response. +</ol> + +<p> +Again, these are default values when running at <tt>HZ=1000</tt>, +and can be overridden. +Again, these forward-progress measures are provided only for RCU, +not for +<a href="#Sleepable RCU">SRCU</a> or +<a href="#Tasks RCU">Tasks RCU</a>. +Even for RCU, callback-invocation forward progress for <tt>rcu_nocbs</tt> +CPUs is much less well-developed, in part because workloads benefiting +from <tt>rcu_nocbs</tt> CPUs tend to invoke <tt>call_rcu()</tt> +relatively infrequently. +If workloads emerge that need both <tt>rcu_nocbs</tt> CPUs and high +<tt>call_rcu()</tt> invocation rates, then additional forward-progress +work will be required. + <h3><a name="Composability">Composability</a></h3> <p> @@ -2272,7 +2373,7 @@ Furthermore, NMI handlers can be interrupted by what appear to RCU to be normal interrupts. One way that this can happen is for code that directly invokes -<tt>rcu_irq_enter()</tt> and </tt>rcu_irq_exit()</tt> to be called +<tt>rcu_irq_enter()</tt> and <tt>rcu_irq_exit()</tt> to be called from an NMI handler. This astonishing fact of life prompted the current code structure, which has <tt>rcu_irq_enter()</tt> invoking <tt>rcu_nmi_enter()</tt> @@ -2294,7 +2395,7 @@ <p> Unfortunately, there is no way to cancel an RCU callback; once you invoke <tt>call_rcu()</tt>, the callback function is -going to eventually be invoked, unless the system goes down first. +eventually going to be invoked, unless the system goes down first. Because it is normally considered socially irresponsible to crash the system in response to a module unload request, we need some other way to deal with in-flight RCU callbacks. @@ -3233,6 +3334,11 @@ originating <tt>call_rcu()</tt> instance, though probably not in production kernels. +<p> +Additional work may be required to provide reasonable forward-progress +guarantees under heavy load for grace periods and for callback +invocation. + <h2><a name="Summary">Summary</a></h2> <p>