| From: Mikulas Patocka <mpatocka@redhat.com> |
| Date: Mon, 13 Nov 2017 12:56:53 -0500 |
| Subject: [PATCH] locking/rt-mutex: fix deadlock in device mapper / block-IO |
| |
| When some block device driver creates a bio and submits it to another |
| block device driver, the bio is added to current->bio_list (in order to |
| avoid unbounded recursion). |
| |
| However, this queuing of bios can cause deadlocks, in order to avoid them, |
| device mapper registers a function flush_current_bio_list. This function |
| is called when device mapper driver blocks. It redirects bios queued on |
| current->bio_list to helper workqueues, so that these bios can proceed |
| even if the driver is blocked. |
| |
| The problem with CONFIG_PREEMPT_RT_FULL is that when the device mapper |
| driver blocks, it won't call flush_current_bio_list (because |
| tsk_is_pi_blocked returns true in sched_submit_work), so deadlocks in |
| block device stack can happen. |
| |
| Note that we can't call blk_schedule_flush_plug if tsk_is_pi_blocked |
| returns true - that would cause |
| BUG_ON(rt_mutex_real_waiter(task->pi_blocked_on)) in |
| task_blocks_on_rt_mutex when flush_current_bio_list attempts to take a |
| spinlock. |
| |
| So the proper fix is to call blk_schedule_flush_plug in rt_mutex_fastlock, |
| when fast acquire failed and when the task is about to block. |
| |
| CC: stable-rt@vger.kernel.org |
| [bigeasy: The deadlock is not device-mapper specific, it can also occur |
| in plain EXT4] |
| Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> |
| Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> |
| --- |
| kernel/locking/rtmutex.c | 13 +++++++++++++ |
| 1 file changed, 13 insertions(+) |
| |
| --- a/kernel/locking/rtmutex.c |
| +++ b/kernel/locking/rtmutex.c |
| @@ -24,6 +24,7 @@ |
| #include <linux/sched/debug.h> |
| #include <linux/timer.h> |
| #include <linux/ww_mutex.h> |
| +#include <linux/blkdev.h> |
| |
| #include "rtmutex_common.h" |
| |
| @@ -1894,6 +1895,15 @@ rt_mutex_fastlock(struct rt_mutex *lock, |
| if (likely(rt_mutex_cmpxchg_acquire(lock, NULL, current))) |
| return 0; |
| |
| + /* |
| + * If rt_mutex blocks, the function sched_submit_work will not call |
| + * blk_schedule_flush_plug (because tsk_is_pi_blocked would be true). |
| + * We must call blk_schedule_flush_plug here, if we don't call it, |
| + * a deadlock in I/O may happen. |
| + */ |
| + if (unlikely(blk_needs_flush_plug(current))) |
| + blk_schedule_flush_plug(current); |
| + |
| return slowfn(lock, state, NULL, RT_MUTEX_MIN_CHAINWALK, ww_ctx); |
| } |
| |
| @@ -1911,6 +1921,9 @@ rt_mutex_timed_fastlock(struct rt_mutex |
| likely(rt_mutex_cmpxchg_acquire(lock, NULL, current))) |
| return 0; |
| |
| + if (unlikely(blk_needs_flush_plug(current))) |
| + blk_schedule_flush_plug(current); |
| + |
| return slowfn(lock, state, timeout, chwalk, ww_ctx); |
| } |
| |