| From 404a8ef512587b2460107d3272c17a89aef75edf Mon Sep 17 00:00:00 2001 |
| From: Sudhakar Panneerselvam <sudhakar.panneerselvam@oracle.com> |
| Date: Tue, 13 Apr 2021 04:08:29 +0000 |
| Subject: md/bitmap: wait for external bitmap writes to complete during tear down |
| |
| From: Sudhakar Panneerselvam <sudhakar.panneerselvam@oracle.com> |
| |
| commit 404a8ef512587b2460107d3272c17a89aef75edf upstream. |
| |
| NULL pointer dereference was observed in super_written() when it tries |
| to access the mddev structure. |
| |
| [The below stack trace is from an older kernel, but the problem described |
| in this patch applies to the mainline kernel.] |
| |
| [ 1194.474861] task: ffff8fdd20858000 task.stack: ffffb99d40790000 |
| [ 1194.488000] RIP: 0010:super_written+0x29/0xe1 |
| [ 1194.499688] RSP: 0018:ffff8ffb7fcc3c78 EFLAGS: 00010046 |
| [ 1194.512477] RAX: 0000000000000000 RBX: ffff8ffb7bf4a000 RCX: ffff8ffb78991048 |
| [ 1194.527325] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8ffb56b8a200 |
| [ 1194.542576] RBP: ffff8ffb7fcc3c90 R08: 000000000000000b R09: 0000000000000000 |
| [ 1194.558001] R10: ffff8ffb56b8a298 R11: 0000000000000000 R12: ffff8ffb56b8a200 |
| [ 1194.573070] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 |
| [ 1194.588117] FS: 0000000000000000(0000) GS:ffff8ffb7fcc0000(0000) knlGS:0000000000000000 |
| [ 1194.604264] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 |
| [ 1194.617375] CR2: 00000000000002b8 CR3: 00000021e040a002 CR4: 00000000007606e0 |
| [ 1194.632327] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 |
| [ 1194.647865] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 |
| [ 1194.663316] PKRU: 55555554 |
| [ 1194.674090] Call Trace: |
| [ 1194.683735] <IRQ> |
| [ 1194.692948] bio_endio+0xae/0x135 |
| [ 1194.703580] blk_update_request+0xad/0x2fa |
| [ 1194.714990] blk_update_bidi_request+0x20/0x72 |
| [ 1194.726578] __blk_end_bidi_request+0x2c/0x4d |
| [ 1194.738373] __blk_end_request_all+0x31/0x49 |
| [ 1194.749344] blk_flush_complete_seq+0x377/0x383 |
| [ 1194.761550] flush_end_io+0x1dd/0x2a7 |
| [ 1194.772910] blk_finish_request+0x9f/0x13c |
| [ 1194.784544] scsi_end_request+0x180/0x25c |
| [ 1194.796149] scsi_io_completion+0xc8/0x610 |
| [ 1194.807503] scsi_finish_command+0xdc/0x125 |
| [ 1194.818897] scsi_softirq_done+0x81/0xde |
| [ 1194.830062] blk_done_softirq+0xa4/0xcc |
| [ 1194.841008] __do_softirq+0xd9/0x29f |
| [ 1194.851257] irq_exit+0xe6/0xeb |
| [ 1194.861290] do_IRQ+0x59/0xe3 |
| [ 1194.871060] common_interrupt+0x1c6/0x382 |
| [ 1194.881988] </IRQ> |
| [ 1194.890646] RIP: 0010:cpuidle_enter_state+0xdd/0x2a5 |
| [ 1194.902532] RSP: 0018:ffffb99d40793e68 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff43 |
| [ 1194.917317] RAX: ffff8ffb7fce27c0 RBX: ffff8ffb7fced800 RCX: 000000000000001f |
| [ 1194.932056] RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000 |
| [ 1194.946428] RBP: ffffb99d40793ea0 R08: 0000000000000004 R09: 0000000000002ed2 |
| [ 1194.960508] R10: 0000000000002664 R11: 0000000000000018 R12: 0000000000000003 |
| [ 1194.974454] R13: 000000000000000b R14: ffffffff925715a0 R15: 0000011610120d5a |
| [ 1194.988607] ? cpuidle_enter_state+0xcc/0x2a5 |
| [ 1194.999077] cpuidle_enter+0x17/0x19 |
| [ 1195.008395] call_cpuidle+0x23/0x3a |
| [ 1195.017718] do_idle+0x172/0x1d5 |
| [ 1195.026358] cpu_startup_entry+0x73/0x75 |
| [ 1195.035769] start_secondary+0x1b9/0x20b |
| [ 1195.044894] secondary_startup_64+0xa5/0xa5 |
| [ 1195.084921] RIP: super_written+0x29/0xe1 RSP: ffff8ffb7fcc3c78 |
| [ 1195.096354] CR2: 00000000000002b8 |
| |
| bio in the above stack is a bitmap write whose completion is invoked after |
| the tear down sequence sets the mddev structure to NULL in rdev. |
| |
| During tear down, there is an attempt to flush the bitmap writes, but for |
| external bitmaps, there is no explicit wait for all the bitmap writes to |
| complete. For instance, md_bitmap_flush() is called to flush the bitmap |
| writes, but the last call to md_bitmap_daemon_work() in md_bitmap_flush() |
| could generate new bitmap writes for which there is no explicit wait to |
| complete those writes. The call to md_bitmap_update_sb() will return |
| simply for external bitmaps and the follow-up call to md_update_sb() is |
| conditional and may not get called for external bitmaps. This results in a |
| kernel panic when the completion routine, super_written() is called which |
| tries to reference mddev in the rdev that has been set to |
| NULL(in unbind_rdev_from_array() by tear down sequence). |
| |
| The solution is to call md_super_wait() for external bitmaps after the |
| last call to md_bitmap_daemon_work() in md_bitmap_flush() to ensure there |
| are no pending bitmap writes before proceeding with the tear down. |
| |
| Cc: stable@vger.kernel.org |
| Signed-off-by: Sudhakar Panneerselvam <sudhakar.panneerselvam@oracle.com> |
| Reviewed-by: Zhao Heming <heming.zhao@suse.com> |
| Signed-off-by: Song Liu <song@kernel.org> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| --- |
| drivers/md/md-bitmap.c | 2 ++ |
| 1 file changed, 2 insertions(+) |
| |
| --- a/drivers/md/md-bitmap.c |
| +++ b/drivers/md/md-bitmap.c |
| @@ -1722,6 +1722,8 @@ void md_bitmap_flush(struct mddev *mddev |
| md_bitmap_daemon_work(mddev); |
| bitmap->daemon_lastrun -= sleep; |
| md_bitmap_daemon_work(mddev); |
| + if (mddev->bitmap_info.external) |
| + md_super_wait(mddev); |
| md_bitmap_update_sb(bitmap); |
| } |
| |