queue/md-avoid-endless-recovery-loop-when-waiting-for-fail.patch - pub/scm/linux/kernel/git/paulg/longterm-queue-2.6.34 - Git at Google

 From 2cb2936f7fd7e15ac810a52a20690f96727482c8 Mon Sep 17 00:00:00 2001
 From: NeilBrown <neilb@suse.de>
 Date: Tue, 28 Jun 2011 16:59:42 +1000
 Subject: [PATCH] md: avoid endless recovery loop when waiting for fail device
  to complete.

 commit 4274215d24633df7302069e51426659d4759c5ed upstream.

 If a device fails in a way that causes pending request to take a while
 to complete, md will not be able to immediately remove it from the
 array in remove_and_add_spares.
 It will then incorrectly look like a spare device and md will try to
 recover it even though it is failed.
 This leads to a recovery process starting and instantly aborting over
 and over again.

 We should check if the device is faulty before considering it to be a
 spare.  This will avoid trying to start a recovery that cannot
 proceed.

 This bug was introduced in 2.6.26 so that patch is suitable for any
 kernel since then.

 Reported-by: Jim Paradis <james.paradis@stratus.com>
 Signed-off-by: NeilBrown <neilb@suse.de>
 Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
 ---
  drivers/md/md.c |    1 +
  1 file changed, 1 insertion(+)

 diff --git a/drivers/md/md.c b/drivers/md/md.c
 index 1287b03..d26df7f 100644
 --- a/drivers/md/md.c
 +++ b/drivers/md/md.c
 @@ -6863,6 +6863,7 @@ static int remove_and_add_spares(mddev_t *mddev)
  		list_for_each_entry(rdev, &mddev->disks, same_set) {
  			if (rdev->raid_disk >= 0 &&
  			    !test_bit(In_sync, &rdev->flags) &&
 +			    !test_bit(Faulty, &rdev->flags) &&
  			    !test_bit(Blocked, &rdev->flags))
  				spares++;
  			if (rdev->raid_disk < 0
 --
 1.7.9.6
	From 2cb2936f7fd7e15ac810a52a20690f96727482c8 Mon Sep 17 00:00:00 2001
	From: NeilBrown <neilb@suse.de>
	Date: Tue, 28 Jun 2011 16:59:42 +1000
	Subject: [PATCH] md: avoid endless recovery loop when waiting for fail device
	to complete.

	commit 4274215d24633df7302069e51426659d4759c5ed upstream.

	If a device fails in a way that causes pending request to take a while
	to complete, md will not be able to immediately remove it from the
	array in remove_and_add_spares.
	It will then incorrectly look like a spare device and md will try to
	recover it even though it is failed.
	This leads to a recovery process starting and instantly aborting over
	and over again.

	We should check if the device is faulty before considering it to be a
	spare. This will avoid trying to start a recovery that cannot
	proceed.

	This bug was introduced in 2.6.26 so that patch is suitable for any
	kernel since then.

	Reported-by: Jim Paradis <james.paradis@stratus.com>
	Signed-off-by: NeilBrown <neilb@suse.de>
	Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
	---
	drivers/md/md.c \| 1 +
	1 file changed, 1 insertion(+)

	diff --git a/drivers/md/md.c b/drivers/md/md.c
	index 1287b03..d26df7f 100644
	--- a/drivers/md/md.c
	+++ b/drivers/md/md.c
	@@ -6863,6 +6863,7 @@ static int remove_and_add_spares(mddev_t *mddev)
	list_for_each_entry(rdev, &mddev->disks, same_set) {
	if (rdev->raid_disk >= 0 &&
	!test_bit(In_sync, &rdev->flags) &&
	+ !test_bit(Faulty, &rdev->flags) &&
	!test_bit(Blocked, &rdev->flags))
	spares++;
	if (rdev->raid_disk < 0
	--
	1.7.9.6