patches/net-tx-action-avoid-livelock-on-rt.patch - pub/scm/linux/kernel/git/paulg/3.8-rt-patches - Git at Google

 Subject: net: Avoid livelock in net_tx_action() on RT
 From: Steven Rostedt <srostedt@redhat.com>
 Date: Thu, 06 Oct 2011 10:48:39 -0400

 qdisc_lock is taken w/o disabling interrupts or bottom halfs. So code
 holding a qdisc_lock() can be interrupted and softirqs can run on the
 return of interrupt in !RT.

 The spin_trylock() in net_tx_action() makes sure, that the softirq
 does not deadlock. When the lock can't be acquired q is requeued and
 the NET_TX softirq is raised. That causes the softirq to run over and
 over.

 That works in mainline as do_softirq() has a retry loop limit and
 leaves the softirq processing in the interrupt return path and
 schedules ksoftirqd. The task which holds qdisc_lock cannot be
 preempted, so the lock is released and either ksoftirqd or the next
 softirq in the return from interrupt path can proceed. Though it's a
 bit strange to actually run MAX_SOFTIRQ_RESTART (10) loops before it
 decides to bail out even if it's clear in the first iteration :)

 On RT all softirq processing is done in a FIFO thread and we don't
 have a loop limit, so ksoftirqd preempts the lock holder forever and
 unqueues and requeues until the reset button is hit.

 Due to the forced threading of ksoftirqd on RT we actually cannot
 deadlock on qdisc_lock because it's a "sleeping lock". So it's safe to
 replace the spin_trylock() with a spin_lock(). When contended,
 ksoftirqd is scheduled out and the lock holder can proceed.

 [ tglx: Massaged changelog and code comments ]

 Solved-by: Thomas Gleixner <tglx@linuxtronix.de>
 Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
 Tested-by: Carsten Emde <cbe@osadl.org>
 Cc: Clark Williams <williams@redhat.com>
 Cc: John Kacur <jkacur@redhat.com>
 Cc: Luis Claudio R. Goncalves <lclaudio@redhat.com>
 Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

 ---
  net/core/dev.c |   32 +++++++++++++++++++++++++++++++-
  1 file changed, 31 insertions(+), 1 deletion(-)

 --- a/net/core/dev.c
 +++ b/net/core/dev.c
 @@ -3128,6 +3128,36 @@ int netif_rx_ni(struct sk_buff *skb)
  }
  EXPORT_SYMBOL(netif_rx_ni);

 +#ifdef CONFIG_PREEMPT_RT_FULL
 +/*
 + * RT runs ksoftirqd as a real time thread and the root_lock is a
 + * "sleeping spinlock". If the trylock fails then we can go into an
 + * infinite loop when ksoftirqd preempted the task which actually
 + * holds the lock, because we requeue q and raise NET_TX softirq
 + * causing ksoftirqd to loop forever.
 + *
 + * It's safe to use spin_lock on RT here as softirqs run in thread
 + * context and cannot deadlock against the thread which is holding
 + * root_lock.
 + *
 + * On !RT the trylock might fail, but there we bail out from the
 + * softirq loop after 10 attempts which we can't do on RT. And the
 + * task holding root_lock cannot be preempted, so the only downside of
 + * that trylock is that we need 10 loops to decide that we should have
 + * given up in the first one :)
 + */
 +static inline int take_root_lock(spinlock_t *lock)
 +{
 +	spin_lock(lock);
 +	return 1;
 +}
 +#else
 +static inline int take_root_lock(spinlock_t *lock)
 +{
 +	return spin_trylock(lock);
 +}
 +#endif
 +
  static void net_tx_action(struct softirq_action *h)
  {
  	struct softnet_data *sd = &__get_cpu_var(softnet_data);
 @@ -3166,7 +3196,7 @@ static void net_tx_action(struct softirq
  			head = head->next_sched;

  			root_lock = qdisc_lock(q);
 -			if (spin_trylock(root_lock)) {
 +			if (take_root_lock(root_lock)) {
  				smp_mb__before_clear_bit();
  				clear_bit(__QDISC_STATE_SCHED,
  					  &q->state);
	Subject: net: Avoid livelock in net_tx_action() on RT
	From: Steven Rostedt <srostedt@redhat.com>
	Date: Thu, 06 Oct 2011 10:48:39 -0400

	qdisc_lock is taken w/o disabling interrupts or bottom halfs. So code
	holding a qdisc_lock() can be interrupted and softirqs can run on the
	return of interrupt in !RT.

	The spin_trylock() in net_tx_action() makes sure, that the softirq
	does not deadlock. When the lock can't be acquired q is requeued and
	the NET_TX softirq is raised. That causes the softirq to run over and
	over.

	That works in mainline as do_softirq() has a retry loop limit and
	leaves the softirq processing in the interrupt return path and
	schedules ksoftirqd. The task which holds qdisc_lock cannot be
	preempted, so the lock is released and either ksoftirqd or the next
	softirq in the return from interrupt path can proceed. Though it's a
	bit strange to actually run MAX_SOFTIRQ_RESTART (10) loops before it
	decides to bail out even if it's clear in the first iteration :)

	On RT all softirq processing is done in a FIFO thread and we don't
	have a loop limit, so ksoftirqd preempts the lock holder forever and
	unqueues and requeues until the reset button is hit.

	Due to the forced threading of ksoftirqd on RT we actually cannot
	deadlock on qdisc_lock because it's a "sleeping lock". So it's safe to
	replace the spin_trylock() with a spin_lock(). When contended,
	ksoftirqd is scheduled out and the lock holder can proceed.

	[ tglx: Massaged changelog and code comments ]

	Solved-by: Thomas Gleixner <tglx@linuxtronix.de>
	Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
	Tested-by: Carsten Emde <cbe@osadl.org>
	Cc: Clark Williams <williams@redhat.com>
	Cc: John Kacur <jkacur@redhat.com>
	Cc: Luis Claudio R. Goncalves <lclaudio@redhat.com>
	Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

	---
	net/core/dev.c \| 32 +++++++++++++++++++++++++++++++-
	1 file changed, 31 insertions(+), 1 deletion(-)

	--- a/net/core/dev.c
	+++ b/net/core/dev.c
	@@ -3128,6 +3128,36 @@ int netif_rx_ni(struct sk_buff *skb)
	}
	EXPORT_SYMBOL(netif_rx_ni);

	+#ifdef CONFIG_PREEMPT_RT_FULL
	+/*
	+ * RT runs ksoftirqd as a real time thread and the root_lock is a
	+ * "sleeping spinlock". If the trylock fails then we can go into an
	+ * infinite loop when ksoftirqd preempted the task which actually
	+ * holds the lock, because we requeue q and raise NET_TX softirq
	+ * causing ksoftirqd to loop forever.
	+ *
	+ * It's safe to use spin_lock on RT here as softirqs run in thread
	+ * context and cannot deadlock against the thread which is holding
	+ * root_lock.
	+ *
	+ * On !RT the trylock might fail, but there we bail out from the
	+ * softirq loop after 10 attempts which we can't do on RT. And the
	+ * task holding root_lock cannot be preempted, so the only downside of
	+ * that trylock is that we need 10 loops to decide that we should have
	+ * given up in the first one :)
	+ */
	+static inline int take_root_lock(spinlock_t *lock)
	+{
	+ spin_lock(lock);
	+ return 1;
	+}
	+#else
	+static inline int take_root_lock(spinlock_t *lock)
	+{
	+ return spin_trylock(lock);
	+}
	+#endif
	+
	static void net_tx_action(struct softirq_action *h)
	{
	struct softnet_data *sd = &__get_cpu_var(softnet_data);
	@@ -3166,7 +3196,7 @@ static void net_tx_action(struct softirq
	head = head->next_sched;

	root_lock = qdisc_lock(q);
	- if (spin_trylock(root_lock)) {
	+ if (take_root_lock(root_lock)) {
	smp_mb__before_clear_bit();
	clear_bit(__QDISC_STATE_SCHED,
	&q->state);