mm: swap: get rid of livelock in swapin readahead

In our testing, a livelock task was found.  Through sysrq printing, same
stack was found every time, as follows:


The reason for the livelock is that swapcache_prepare() always returns
EEXIST, indicating that SWAP_HAS_CACHE has not been cleared, so that it
cannot jump out of the loop.  We suspect that the task that clears the
SWAP_HAS_CACHE flag never gets a chance to run.  We try to lower the
priority of the task stuck in a livelock so that the task that clears
the SWAP_HAS_CACHE flag will run.  The results show that the system
returns to normal after the priority is lowered.

In our testing, multiple real-time tasks are bound to the same core, and
the task in the livelock is the highest priority task of the core, so
the livelocked task cannot be preempted.

Although cond_resched() is used by __read_swap_cache_async, it is an
empty function in the preemptive system and cannot achieve the purpose
of releasing the CPU.  A high-priority task cannot release the CPU
unless preempted by a higher-priority task.  But when this task is
already the highest priority task on this core, other tasks will not be
able to be scheduled.  So we think we should replace cond_resched() with
schedule_timeout_uninterruptible(1), schedule_timeout_interruptible will
call set_current_state first to set the task state, so the task will be
removed from the running queue, so as to achieve the purpose of giving
up the CPU and prevent it from running in kernel mode for too long.

(akpm: ugly hack becomes uglier.  But it fixes the issue in a
backportable-to-stable fashion while we hopefully work on something

Signed-off-by: Guo Ziliang <>
Reported-by: Zeal Robot <>
Reviewed-by: Ran Xiaokai <>
Reviewed-by: Jiang Xuexin <>
Reviewed-by: Yang Yang <>
Acked-by: Hugh Dickins <>
Cc: Naoya Horiguchi <>
Cc: Michal Hocko <>
Cc: Minchan Kim <>
Cc: Johannes Weiner <>
Cc: Roger Quadros <>
Cc: Ziliang Guo <>
Cc: <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 8d41042..ee67164 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -478,7 +478,7 @@
 		 * __read_swap_cache_async(), which has set SWAP_HAS_CACHE
 		 * in swap_map, but not yet added its page to swap cache.
-		cond_resched();
+		schedule_timeout_uninterruptible(1);