| From 852e4a8152b427c3f318bb0e1b5e938d64dcdc32 Mon Sep 17 00:00:00 2001 |
| From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> |
| Date: Tue, 25 Dec 2012 23:02:48 +0100 |
| Subject: tty: don't deadlock while flushing workqueue |
| |
| From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> |
| |
| commit 852e4a8152b427c3f318bb0e1b5e938d64dcdc32 upstream. |
| |
| Since commit 89c8d91e31f2 ("tty: localise the lock") I see a dead lock |
| in one of my dummy_hcd + g_nokia test cases. The first run was usually |
| okay, the second often resulted in a splat by lockdep and the third was |
| usually a dead lock. |
| Lockdep complained about tty->hangup_work and tty->legacy_mutex taken |
| both ways: |
| | ====================================================== |
| | [ INFO: possible circular locking dependency detected ] |
| | 3.7.0-rc6+ #204 Not tainted |
| | ------------------------------------------------------- |
| | kworker/2:1/35 is trying to acquire lock: |
| | (&tty->legacy_mutex){+.+.+.}, at: [<c14051e6>] tty_lock_nested+0x36/0x80 |
| | |
| | but task is already holding lock: |
| | ((&tty->hangup_work)){+.+...}, at: [<c104f6e4>] process_one_work+0x124/0x5e0 |
| | |
| | which lock already depends on the new lock. |
| | |
| | the existing dependency chain (in reverse order) is: |
| | |
| | -> #2 ((&tty->hangup_work)){+.+...}: |
| | [<c107fe74>] lock_acquire+0x84/0x190 |
| | [<c104d82d>] flush_work+0x3d/0x240 |
| | [<c12e6986>] tty_ldisc_flush_works+0x16/0x30 |
| | [<c12e7861>] tty_ldisc_release+0x21/0x70 |
| | [<c12e0dfc>] tty_release+0x35c/0x470 |
| | [<c1105e28>] __fput+0xd8/0x270 |
| | [<c1105fcd>] ____fput+0xd/0x10 |
| | [<c1051dd9>] task_work_run+0xb9/0xf0 |
| | [<c1002a51>] do_notify_resume+0x51/0x80 |
| | [<c140550a>] work_notifysig+0x35/0x3b |
| | |
| | -> #1 (&tty->legacy_mutex/1){+.+...}: |
| | [<c107fe74>] lock_acquire+0x84/0x190 |
| | [<c140276c>] mutex_lock_nested+0x6c/0x2f0 |
| | [<c14051e6>] tty_lock_nested+0x36/0x80 |
| | [<c1405279>] tty_lock_pair+0x29/0x70 |
| | [<c12e0bb8>] tty_release+0x118/0x470 |
| | [<c1105e28>] __fput+0xd8/0x270 |
| | [<c1105fcd>] ____fput+0xd/0x10 |
| | [<c1051dd9>] task_work_run+0xb9/0xf0 |
| | [<c1002a51>] do_notify_resume+0x51/0x80 |
| | [<c140550a>] work_notifysig+0x35/0x3b |
| | |
| | -> #0 (&tty->legacy_mutex){+.+.+.}: |
| | [<c107f3c9>] __lock_acquire+0x1189/0x16a0 |
| | [<c107fe74>] lock_acquire+0x84/0x190 |
| | [<c140276c>] mutex_lock_nested+0x6c/0x2f0 |
| | [<c14051e6>] tty_lock_nested+0x36/0x80 |
| | [<c140523f>] tty_lock+0xf/0x20 |
| | [<c12df8e4>] __tty_hangup+0x54/0x410 |
| | [<c12dfcb2>] do_tty_hangup+0x12/0x20 |
| | [<c104f763>] process_one_work+0x1a3/0x5e0 |
| | [<c104fec9>] worker_thread+0x119/0x3a0 |
| | [<c1055084>] kthread+0x94/0xa0 |
| | [<c140ca37>] ret_from_kernel_thread+0x1b/0x28 |
| | |
| |other info that might help us debug this: |
| | |
| |Chain exists of: |
| | &tty->legacy_mutex --> &tty->legacy_mutex/1 --> (&tty->hangup_work) |
| | |
| | Possible unsafe locking scenario: |
| | |
| | CPU0 CPU1 |
| | ---- ---- |
| | lock((&tty->hangup_work)); |
| | lock(&tty->legacy_mutex/1); |
| | lock((&tty->hangup_work)); |
| | lock(&tty->legacy_mutex); |
| | |
| | *** DEADLOCK *** |
| |
| Before the path mentioned tty_ldisc_release() look like this: |
| |
| | tty_ldisc_halt(tty); |
| | tty_ldisc_flush_works(tty); |
| | tty_lock(); |
| |
| As it can be seen, it first flushes the workqueue and then grabs the |
| tty_lock. Now we grab the lock first: |
| |
| | tty_lock_pair(tty, o_tty); |
| | tty_ldisc_halt(tty); |
| | tty_ldisc_flush_works(tty); |
| |
| so lockdep's complaint seems valid. |
| |
| The earlier version of this patch took the ldisc_mutex since the other |
| user of tty_ldisc_flush_works() (tty_set_ldisc()) did this. |
| Peter Hurley then said that it is should not be requried. Since it |
| wasn't done earlier, I dropped this part. |
| The code under tty_ldisc_kill() was executed earlier with the tty lock |
| taken so it is taken again. |
| |
| I was able to reproduce the deadlock on v3.8-rc1, this patch fixes the |
| problem in my testcase. I didn't notice any problems so far. |
| |
| Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> |
| Cc: Alan Cox <alan@linux.intel.com> |
| Cc: Peter Hurley <peter@hurleysoftware.com> |
| Cc: Bryan O'Donoghue <bryan.odonoghue.lkml@nexus-software.ie> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| |
| --- |
| drivers/tty/tty_ldisc.c | 10 +++++----- |
| 1 file changed, 5 insertions(+), 5 deletions(-) |
| |
| --- a/drivers/tty/tty_ldisc.c |
| +++ b/drivers/tty/tty_ldisc.c |
| @@ -934,17 +934,17 @@ void tty_ldisc_release(struct tty_struct |
| * race with the set_ldisc code path. |
| */ |
| |
| - tty_lock_pair(tty, o_tty); |
| tty_ldisc_halt(tty); |
| - tty_ldisc_flush_works(tty); |
| - if (o_tty) { |
| + if (o_tty) |
| tty_ldisc_halt(o_tty); |
| + |
| + tty_ldisc_flush_works(tty); |
| + if (o_tty) |
| tty_ldisc_flush_works(o_tty); |
| - } |
| |
| + tty_lock_pair(tty, o_tty); |
| /* This will need doing differently if we need to lock */ |
| tty_ldisc_kill(tty); |
| - |
| if (o_tty) |
| tty_ldisc_kill(o_tty); |
| |