| From d2b698644c97cb033261536a4f2010924a00eac9 Mon Sep 17 00:00:00 2001 |
| From: Jonathan Brassow <jbrassow@redhat.com> |
| Date: Fri, 4 Sep 2009 20:40:32 +0100 |
| Subject: dm raid1: do not allow log_failure variable to unset after being set |
| |
| From: Jonathan Brassow <jbrassow@redhat.com> |
| |
| commit d2b698644c97cb033261536a4f2010924a00eac9 upstream. |
| |
| This patch fixes a bug which was triggering a case where the primary leg |
| could not be changed on failure even when the mirror was in-sync. |
| |
| The case involves the failure of the primary device along with |
| the transient failure of the log device. The problem is that |
| bios can be put on the 'failures' list (due to log failure) |
| before 'fail_mirror' is called due to the primary device failure. |
| Normally, this is fine, but if the log device failure is transient, |
| a subsequent iteration of the work thread, 'do_mirror', will |
| reset 'log_failure'. The 'do_failures' function then resets |
| the 'in_sync' variable when processing bios on the failures list. |
| The 'in_sync' variable is what is used to determine if the |
| primary device can be switched in the event of a failure. Since |
| this has been reset, the primary device is incorrectly assumed |
| to be not switchable. |
| |
| The case has been seen in the cluster mirror context, where one |
| machine realizes the log device is dead before the other machines. |
| As the responsibilities of the server migrate from one node to |
| another (because the mirror is being reconfigured due to the failure), |
| the new server may think for a moment that the log device is fine - |
| thus resetting the 'log_failure' variable. |
| |
| In any case, it is inappropiate for us to reset the 'log_failure' |
| variable. The above bug simply illustrates that it can actually |
| hurt us. |
| |
| Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> |
| Signed-off-by: Alasdair G Kergon <agk@redhat.com> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| |
| --- |
| drivers/md/dm-raid1.c | 8 +++++++- |
| 1 file changed, 7 insertions(+), 1 deletion(-) |
| |
| --- a/drivers/md/dm-raid1.c |
| +++ b/drivers/md/dm-raid1.c |
| @@ -648,7 +648,13 @@ static void do_writes(struct mirror_set |
| */ |
| dm_rh_inc_pending(ms->rh, &sync); |
| dm_rh_inc_pending(ms->rh, &nosync); |
| - ms->log_failure = dm_rh_flush(ms->rh) ? 1 : 0; |
| + |
| + /* |
| + * If the flush fails on a previous call and succeeds here, |
| + * we must not reset the log_failure variable. We need |
| + * userspace interaction to do that. |
| + */ |
| + ms->log_failure = dm_rh_flush(ms->rh) ? 1 : ms->log_failure; |
| |
| /* |
| * Dispatch io. |