ocfs2/dlm: fix race between convert and recovery
There is a race window between dlmconvert_remote and
dlm_move_lockres_to_recovery_list, which will cause a lock with
OCFS2_LOCK_BUSY in grant list, thus system hangs.
lock->convert_pending = 1;
status = dlm_send_remote_convert_request();
>>>>>> race window, master has queued ast and return DLM_NORMAL,
and then down before sending ast.
this node detects master down and calls
dlm_move_lockres_to_recovery_list, which will revert the
lock to grant list.
Then OCFS2_LOCK_BUSY won't be cleared as new master won't
send ast any more because it thinks already be authorized.
lock->convert_pending = 0;
if (status != DLM_NORMAL)
In this case, check if res->state has DLM_LOCK_RES_RECOVERING bit set
(res is still in recovering) or res master changed (new master has
finished recovery), reset the status to DLM_RECOVERING, then it will
Signed-off-by: Joseph Qi <firstname.lastname@example.org>
Reported-by: Yiwen Jiang <email@example.com>
Reviewed-by: Junxiao Bi <firstname.lastname@example.org>
Cc: Mark Fasheh <email@example.com>
Cc: Joel Becker <firstname.lastname@example.org>
Cc: Tariq Saeed <email@example.com>
Cc: Junxiao Bi <firstname.lastname@example.org>
Signed-off-by: Andrew Morton <email@example.com>
Signed-off-by: Linus Torvalds <firstname.lastname@example.org>
1 file changed