releases/4.14.71/md-raid5-fix-data-corruption-of-replacements-after-originals-dropped.patch - pub/scm/linux/kernel/git/stable/stable-queue - Git at Google

 From foo@baz Mon Sep 17 12:33:31 CEST 2018
 From: BingJing Chang <bingjingc@synology.com>
 Date: Wed, 1 Aug 2018 17:08:36 +0800
 Subject: md/raid5: fix data corruption of replacements after originals dropped

 From: BingJing Chang <bingjingc@synology.com>

 [ Upstream commit d63e2fc804c46e50eee825c5d3a7228e07048b47 ]

 During raid5 replacement, the stripes can be marked with R5_NeedReplace
 flag. Data can be read from being-replaced devices and written to
 replacing spares without reading all other devices. (It's 'replace'
 mode. s.replacing = 1) If a being-replaced device is dropped, the
 replacement progress will be interrupted and resumed with pure recovery
 mode. However, existing stripes before being interrupted cannot read
 from the dropped device anymore. It prints lots of WARN_ON messages.
 And it results in data corruption because existing stripes write
 problematic data into its replacement device and update the progress.

 \# Erase disks (1MB + 2GB)
 dd if=/dev/zero of=/dev/sda bs=1MB count=2049
 dd if=/dev/zero of=/dev/sdb bs=1MB count=2049
 dd if=/dev/zero of=/dev/sdc bs=1MB count=2049
 dd if=/dev/zero of=/dev/sdd bs=1MB count=2049
 mdadm -C /dev/md0 -amd -R -l5 -n3 -x0 /dev/sd[abc] -z 2097152
 \# Ensure array stores non-zero data
 dd if=/root/data_4GB.iso of=/dev/md0 bs=1MB
 \# Start replacement
 mdadm /dev/md0 -a /dev/sdd
 mdadm /dev/md0 --replace /dev/sda

 Then, Hot-plug out /dev/sda during recovery, and wait for recovery done.
 echo check > /sys/block/md0/md/sync_action
 cat /sys/block/md0/md/mismatch_cnt # it will be greater than 0.

 Soon after you hot-plug out /dev/sda, you will see many WARN_ON
 messages. The replacement recovery will be interrupted shortly. After
 the recovery finishes, it will result in data corruption.

 Actually, it's just an unhandled case of replacement. In commit
 <f94c0b6658c7> (md/raid5: fix interaction of 'replace' and 'recovery'.),
 if a NeedReplace device is not UPTODATE then that is an error, the
 commit just simply print WARN_ON but also mark these corrupted stripes
 with R5_WantReplace. (it means it's ready for writes.)

 To fix this case, we can leverage 'sync and replace' mode mentioned in
 commit <9a3e1101b827> (md/raid5: detect and handle replacements during
 recovery.). We can add logics to detect and use 'sync and replace' mode
 for these stripes.

 Reported-by: Alex Chen <alexchen@synology.com>
 Reviewed-by: Alex Wu <alexwu@synology.com>
 Reviewed-by: Chung-Chiang Cheng <cccheng@synology.com>
 Signed-off-by: BingJing Chang <bingjingc@synology.com>
 Signed-off-by: Shaohua Li <shli@fb.com>
 Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 ---
  drivers/md/raid5.c |    6 ++++++
  1 file changed, 6 insertions(+)

 --- a/drivers/md/raid5.c
 +++ b/drivers/md/raid5.c
 @@ -4516,6 +4516,12 @@ static void analyse_stripe(struct stripe
  			s->failed++;
  			if (rdev && !test_bit(Faulty, &rdev->flags))
  				do_recovery = 1;
 +			else if (!rdev) {
 +				rdev = rcu_dereference(
 +				    conf->disks[i].replacement);
 +				if (rdev && !test_bit(Faulty, &rdev->flags))
 +					do_recovery = 1;
 +			}
  		}

  		if (test_bit(R5_InJournal, &dev->flags))
	From foo@baz Mon Sep 17 12:33:31 CEST 2018
	From: BingJing Chang <bingjingc@synology.com>
	Date: Wed, 1 Aug 2018 17:08:36 +0800
	Subject: md/raid5: fix data corruption of replacements after originals dropped

	From: BingJing Chang <bingjingc@synology.com>

	[ Upstream commit d63e2fc804c46e50eee825c5d3a7228e07048b47 ]

	During raid5 replacement, the stripes can be marked with R5_NeedReplace
	flag. Data can be read from being-replaced devices and written to
	replacing spares without reading all other devices. (It's 'replace'
	mode. s.replacing = 1) If a being-replaced device is dropped, the
	replacement progress will be interrupted and resumed with pure recovery
	mode. However, existing stripes before being interrupted cannot read
	from the dropped device anymore. It prints lots of WARN_ON messages.
	And it results in data corruption because existing stripes write
	problematic data into its replacement device and update the progress.

	\# Erase disks (1MB + 2GB)
	dd if=/dev/zero of=/dev/sda bs=1MB count=2049
	dd if=/dev/zero of=/dev/sdb bs=1MB count=2049
	dd if=/dev/zero of=/dev/sdc bs=1MB count=2049
	dd if=/dev/zero of=/dev/sdd bs=1MB count=2049
	mdadm -C /dev/md0 -amd -R -l5 -n3 -x0 /dev/sd[abc] -z 2097152
	\# Ensure array stores non-zero data
	dd if=/root/data_4GB.iso of=/dev/md0 bs=1MB
	\# Start replacement
	mdadm /dev/md0 -a /dev/sdd
	mdadm /dev/md0 --replace /dev/sda

	Then, Hot-plug out /dev/sda during recovery, and wait for recovery done.
	echo check > /sys/block/md0/md/sync_action
	cat /sys/block/md0/md/mismatch_cnt # it will be greater than 0.

	Soon after you hot-plug out /dev/sda, you will see many WARN_ON
	messages. The replacement recovery will be interrupted shortly. After
	the recovery finishes, it will result in data corruption.

	Actually, it's just an unhandled case of replacement. In commit
	<f94c0b6658c7> (md/raid5: fix interaction of 'replace' and 'recovery'.),
	if a NeedReplace device is not UPTODATE then that is an error, the
	commit just simply print WARN_ON but also mark these corrupted stripes
	with R5_WantReplace. (it means it's ready for writes.)

	To fix this case, we can leverage 'sync and replace' mode mentioned in
	commit <9a3e1101b827> (md/raid5: detect and handle replacements during
	recovery.). We can add logics to detect and use 'sync and replace' mode
	for these stripes.

	Reported-by: Alex Chen <alexchen@synology.com>
	Reviewed-by: Alex Wu <alexwu@synology.com>
	Reviewed-by: Chung-Chiang Cheng <cccheng@synology.com>
	Signed-off-by: BingJing Chang <bingjingc@synology.com>
	Signed-off-by: Shaohua Li <shli@fb.com>
	Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
	Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
	---
	drivers/md/raid5.c \| 6 ++++++
	1 file changed, 6 insertions(+)

	--- a/drivers/md/raid5.c
	+++ b/drivers/md/raid5.c
	@@ -4516,6 +4516,12 @@ static void analyse_stripe(struct stripe
	s->failed++;
	if (rdev && !test_bit(Faulty, &rdev->flags))
	do_recovery = 1;
	+ else if (!rdev) {
	+ rdev = rcu_dereference(
	+ conf->disks[i].replacement);
	+ if (rdev && !test_bit(Faulty, &rdev->flags))
	+ do_recovery = 1;
	+ }
	}

	if (test_bit(R5_InJournal, &dev->flags))