| From 1c01967116a678fed8e2c68a6ab82abc8effeddc Mon Sep 17 00:00:00 2001 |
| From: Changwei Ge <ge.changwei@h3c.com> |
| Date: Wed, 15 Nov 2017 17:31:33 -0800 |
| Subject: ocfs2: fix cluster hang after a node dies |
| |
| From: Changwei Ge <ge.changwei@h3c.com> |
| |
| commit 1c01967116a678fed8e2c68a6ab82abc8effeddc upstream. |
| |
| When a node dies, other live nodes have to choose a new master for an |
| existed lock resource mastered by the dead node. |
| |
| As for ocfs2/dlm implementation, this is done by function - |
| dlm_move_lockres_to_recovery_list which marks those lock rsources as |
| DLM_LOCK_RES_RECOVERING and manages them via a list from which DLM |
| changes lock resource's master later. |
| |
| So without invoking dlm_move_lockres_to_recovery_list, no master will be |
| choosed after dlm recovery accomplishment since no lock resource can be |
| found through ::resource list. |
| |
| What's worse is that if DLM_LOCK_RES_RECOVERING is not marked for lock |
| resources mastered a dead node, it will break up synchronization among |
| nodes. |
| |
| So invoke dlm_move_lockres_to_recovery_list again. |
| |
| Fixs: 'commit ee8f7fcbe638 ("ocfs2/dlm: continue to purge recovery lockres when recovery master goes down")' |
| Link: http://lkml.kernel.org/r/63ADC13FD55D6546B7DECE290D39E373CED6E0F9@H3CMLB14-EX.srv.huawei-3com.com |
| Signed-off-by: Changwei Ge <ge.changwei@h3c.com> |
| Reported-by: Vitaly Mayatskih <v.mayatskih@gmail.com> |
| Tested-by: Vitaly Mayatskikh <v.mayatskih@gmail.com> |
| Cc: Mark Fasheh <mfasheh@versity.com> |
| Cc: Joel Becker <jlbec@evilplan.org> |
| Cc: Junxiao Bi <junxiao.bi@oracle.com> |
| Cc: Joseph Qi <jiangqi903@gmail.com> |
| Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
| Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| |
| --- |
| fs/ocfs2/dlm/dlmrecovery.c | 1 + |
| 1 file changed, 1 insertion(+) |
| |
| --- a/fs/ocfs2/dlm/dlmrecovery.c |
| +++ b/fs/ocfs2/dlm/dlmrecovery.c |
| @@ -2419,6 +2419,7 @@ static void dlm_do_local_recovery_cleanu |
| dlm_lockres_put(res); |
| continue; |
| } |
| + dlm_move_lockres_to_recovery_list(dlm, res); |
| } else if (res->owner == dlm->node_num) { |
| dlm_free_dead_locks(dlm, res, dead_node); |
| __dlm_lockres_calc_usage(dlm, res); |