| From foo@baz Mon Sep 17 12:33:31 CEST 2018 |
| From: BingJing Chang <bingjingc@synology.com> |
| Date: Wed, 1 Aug 2018 17:08:36 +0800 |
| Subject: md/raid5: fix data corruption of replacements after originals dropped |
| |
| From: BingJing Chang <bingjingc@synology.com> |
| |
| [ Upstream commit d63e2fc804c46e50eee825c5d3a7228e07048b47 ] |
| |
| During raid5 replacement, the stripes can be marked with R5_NeedReplace |
| flag. Data can be read from being-replaced devices and written to |
| replacing spares without reading all other devices. (It's 'replace' |
| mode. s.replacing = 1) If a being-replaced device is dropped, the |
| replacement progress will be interrupted and resumed with pure recovery |
| mode. However, existing stripes before being interrupted cannot read |
| from the dropped device anymore. It prints lots of WARN_ON messages. |
| And it results in data corruption because existing stripes write |
| problematic data into its replacement device and update the progress. |
| |
| \# Erase disks (1MB + 2GB) |
| dd if=/dev/zero of=/dev/sda bs=1MB count=2049 |
| dd if=/dev/zero of=/dev/sdb bs=1MB count=2049 |
| dd if=/dev/zero of=/dev/sdc bs=1MB count=2049 |
| dd if=/dev/zero of=/dev/sdd bs=1MB count=2049 |
| mdadm -C /dev/md0 -amd -R -l5 -n3 -x0 /dev/sd[abc] -z 2097152 |
| \# Ensure array stores non-zero data |
| dd if=/root/data_4GB.iso of=/dev/md0 bs=1MB |
| \# Start replacement |
| mdadm /dev/md0 -a /dev/sdd |
| mdadm /dev/md0 --replace /dev/sda |
| |
| Then, Hot-plug out /dev/sda during recovery, and wait for recovery done. |
| echo check > /sys/block/md0/md/sync_action |
| cat /sys/block/md0/md/mismatch_cnt # it will be greater than 0. |
| |
| Soon after you hot-plug out /dev/sda, you will see many WARN_ON |
| messages. The replacement recovery will be interrupted shortly. After |
| the recovery finishes, it will result in data corruption. |
| |
| Actually, it's just an unhandled case of replacement. In commit |
| <f94c0b6658c7> (md/raid5: fix interaction of 'replace' and 'recovery'.), |
| if a NeedReplace device is not UPTODATE then that is an error, the |
| commit just simply print WARN_ON but also mark these corrupted stripes |
| with R5_WantReplace. (it means it's ready for writes.) |
| |
| To fix this case, we can leverage 'sync and replace' mode mentioned in |
| commit <9a3e1101b827> (md/raid5: detect and handle replacements during |
| recovery.). We can add logics to detect and use 'sync and replace' mode |
| for these stripes. |
| |
| Reported-by: Alex Chen <alexchen@synology.com> |
| Reviewed-by: Alex Wu <alexwu@synology.com> |
| Reviewed-by: Chung-Chiang Cheng <cccheng@synology.com> |
| Signed-off-by: BingJing Chang <bingjingc@synology.com> |
| Signed-off-by: Shaohua Li <shli@fb.com> |
| Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| --- |
| drivers/md/raid5.c | 6 ++++++ |
| 1 file changed, 6 insertions(+) |
| |
| --- a/drivers/md/raid5.c |
| +++ b/drivers/md/raid5.c |
| @@ -4516,6 +4516,12 @@ static void analyse_stripe(struct stripe |
| s->failed++; |
| if (rdev && !test_bit(Faulty, &rdev->flags)) |
| do_recovery = 1; |
| + else if (!rdev) { |
| + rdev = rcu_dereference( |
| + conf->disks[i].replacement); |
| + if (rdev && !test_bit(Faulty, &rdev->flags)) |
| + do_recovery = 1; |
| + } |
| } |
| |
| if (test_bit(R5_InJournal, &dev->flags)) |