patches/old/migrate_pages-fix-deadlock-in-batched-migration.patch - pub/scm/linux/kernel/git/akpm/25-new - Git at Google

 From: Huang Ying <ying.huang@intel.com>
 Subject: migrate_pages: fix deadlock in batched migration
 Date: Fri, 3 Mar 2023 11:01:53 +0800

 Patch series "migrate_pages: fix deadlock in batched synchronous
 migration", v2.

 Two deadlock bugs were reported for the migrate_pages() batching series.
 Thanks Hugh and Pengfei.  Analysis shows that if we have locked some other
 folios except the one we are migrating, it's not safe in general to wait
 synchronously, for example, to wait the writeback to complete or wait to
 lock the buffer head.

 So 1/3 fixes the deadlock in a simple way, where the batching support for
 the synchronous migration is disabled.  The change is straightforward and
 easy to be understood.  While 3/3 re-introduce the batching for
 synchronous migration via trying to migrate asynchronously in batch
 optimistically, then fall back to migrate synchronously one by one for
 fail-to-migrate folios.  Test shows that this can restore the TLB flushing
 batching performance for synchronous migration effectively.


 This patch (of 3):

 Two deadlock bugs were reported for the migrate_pages() batching series.
 Thanks Hugh and Pengfei!  For example, in the following deadlock trace
 snippet,

  INFO: task kworker/u4:0:9 blocked for more than 147 seconds.
        Not tainted 6.2.0-rc4-kvm+ #1314
  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  task:kworker/u4:0    state:D stack:0     pid:9     ppid:2      flags:0x00004000
  Workqueue: loop4 loop_rootcg_workfn
  Call Trace:
   <TASK>
   __schedule+0x43b/0xd00
   schedule+0x6a/0xf0
   io_schedule+0x4a/0x80
   folio_wait_bit_common+0x1b5/0x4e0
   ? __pfx_wake_page_function+0x10/0x10
   __filemap_get_folio+0x73d/0x770
   shmem_get_folio_gfp+0x1fd/0xc80
   shmem_write_begin+0x91/0x220
   generic_perform_write+0x10e/0x2e0
   __generic_file_write_iter+0x17e/0x290
   ? generic_write_checks+0x12b/0x1a0
   generic_file_write_iter+0x97/0x180
   ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
   do_iter_readv_writev+0x13c/0x210
   ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
   do_iter_write+0xf6/0x330
   vfs_iter_write+0x46/0x70
   loop_process_work+0x723/0xfe0
   loop_rootcg_workfn+0x28/0x40
   process_one_work+0x3cc/0x8d0
   worker_thread+0x66/0x630
   ? __pfx_worker_thread+0x10/0x10
   kthread+0x153/0x190
   ? __pfx_kthread+0x10/0x10
   ret_from_fork+0x29/0x50
   </TASK>

  INFO: task repro:1023 blocked for more than 147 seconds.
        Not tainted 6.2.0-rc4-kvm+ #1314
  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  task:repro           state:D stack:0     pid:1023  ppid:360    flags:0x00004004
  Call Trace:
   <TASK>
   __schedule+0x43b/0xd00
   schedule+0x6a/0xf0
   io_schedule+0x4a/0x80
   folio_wait_bit_common+0x1b5/0x4e0
   ? compaction_alloc+0x77/0x1150
   ? __pfx_wake_page_function+0x10/0x10
   folio_wait_bit+0x30/0x40
   folio_wait_writeback+0x2e/0x1e0
   migrate_pages_batch+0x555/0x1ac0
   ? __pfx_compaction_alloc+0x10/0x10
   ? __pfx_compaction_free+0x10/0x10
   ? __this_cpu_preempt_check+0x17/0x20
   ? lock_is_held_type+0xe6/0x140
   migrate_pages+0x100e/0x1180
   ? __pfx_compaction_free+0x10/0x10
   ? __pfx_compaction_alloc+0x10/0x10
   compact_zone+0xe10/0x1b50
   ? lock_is_held_type+0xe6/0x140
   ? check_preemption_disabled+0x80/0xf0
   compact_node+0xa3/0x100
   ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30
   ? _find_first_bit+0x7b/0x90
   sysctl_compaction_handler+0x5d/0xb0
   proc_sys_call_handler+0x29d/0x420
   proc_sys_write+0x2b/0x40
   vfs_write+0x3a3/0x780
   ksys_write+0xb7/0x180
   __x64_sys_write+0x26/0x30
   do_syscall_64+0x3b/0x90
   entry_SYSCALL_64_after_hwframe+0x72/0xdc
  RIP: 0033:0x7f3a2471f59d
  RSP: 002b:00007ffe567f7288 EFLAGS: 00000217 ORIG_RAX: 0000000000000001
  RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3a2471f59d
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000005
  RBP: 00007ffe567f72a0 R08: 0000000000000010 R09: 0000000000000010
  R10: 0000000000000010 R11: 0000000000000217 R12: 00000000004012e0
  R13: 00007ffe567f73e0 R14: 0000000000000000 R15: 0000000000000000
   </TASK>

 The page migration task has held the lock of the shmem folio A, and is
 waiting the writeback of the folio B of the file system on the loop block
 device to complete.  While the loop worker task which writes back the
 folio B is waiting to lock the shmem folio A, because the folio A backs
 the folio B in the loop device.  Thus deadlock is triggered.

 In general, if we have locked some other folios except the one we are
 migrating, it's not safe to wait synchronously, for example, to wait the
 writeback to complete or wait to lock the buffer head.

 To fix the deadlock, in this patch, we avoid to batch the page migration
 except for MIGRATE_ASYNC mode.  In MIGRATE_ASYNC mode, synchronous waiting
 is avoided.

 The fix can be improved further.  We will do that as soon as possible.

 Link: https://lkml.kernel.org/r/20230303030155.160983-1-ying.huang@intel.com
 Link: https://lore.kernel.org/linux-mm/87a6c8c-c5c1-67dc-1e32-eb30831d6e3d@google.com/
 Link: https://lore.kernel.org/linux-mm/874jrg7kke.fsf@yhuang6-desk2.ccr.corp.intel.com/
 Link: https://lore.kernel.org/linux-mm/20230227110614.dngdub2j3exr6dfp@quack3/
 Link: https://lkml.kernel.org/r/20230303030155.160983-2-ying.huang@intel.com
 Fixes: 5dfab109d519 ("migrate_pages: batch _unmap and _move")
 Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
 Reported-by: Hugh Dickins <hughd@google.com>
 Reported-by: "Xu, Pengfei" <pengfei.xu@intel.com>
 Cc: Jan Kara <jack@suse.cz>
 Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
 Cc: Christoph Hellwig <hch@lst.de>
 Cc: Stefan Roesch <shr@devkernel.io>
 Cc: Tejun Heo <tj@kernel.org>
 Cc: Xin Hao <xhao@linux.alibaba.com>
 Cc: Zi Yan <ziy@nvidia.com>
 Cc: Yang Shi <shy828301@gmail.com>
 Cc: Matthew Wilcox <willy@infradead.org>
 Cc: Mike Kravetz <mike.kravetz@oracle.com>
 Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
 ---


 --- a/mm/migrate.c~migrate_pages-fix-deadlock-in-batched-migration
 +++ a/mm/migrate.c
 @@ -1112,7 +1112,7 @@ static void migrate_folio_done(struct fo
  /* Obtain the lock on page, remove all ptes. */
  static int migrate_folio_unmap(new_page_t get_new_page, free_page_t put_new_page,
  			       unsigned long private, struct folio *src,
 -			       struct folio **dstp, int force, bool avoid_force_lock,
 +			       struct folio **dstp, int force,
  			       enum migrate_mode mode, enum migrate_reason reason,
  			       struct list_head *ret)
  {
 @@ -1163,17 +1163,6 @@ static int migrate_folio_unmap(new_page_
  		if (current->flags & PF_MEMALLOC)
  			goto out;

 -		/*
 -		 * We have locked some folios and are going to wait to lock
 -		 * this folio.  To avoid a potential deadlock, let's bail
 -		 * out and not do that. The locked folios will be moved and
 -		 * unlocked, then we can wait to lock this folio.
 -		 */
 -		if (avoid_force_lock) {
 -			rc = -EDEADLOCK;
 -			goto out;
 -		}
 -
  		folio_lock(src);
  	}
  	locked = true;
 @@ -1253,7 +1242,7 @@ static int migrate_folio_unmap(new_page_
  		/* Establish migration ptes */
  		VM_BUG_ON_FOLIO(folio_test_anon(src) &&
  			       !folio_test_ksm(src) && !anon_vma, src);
 -		try_to_migrate(src, TTU_BATCH_FLUSH);
 +		try_to_migrate(src, mode == MIGRATE_ASYNC ? TTU_BATCH_FLUSH : 0);
  		page_was_mapped = 1;
  	}

 @@ -1267,7 +1256,7 @@ out:
  	 * A folio that has not been unmapped will be restored to
  	 * right list unless we want to retry.
  	 */
 -	if (rc == -EAGAIN || rc == -EDEADLOCK)
 +	if (rc == -EAGAIN)
  		ret = NULL;

  	migrate_folio_undo_src(src, page_was_mapped, anon_vma, locked, ret);
 @@ -1618,6 +1607,11 @@ static int migrate_hugetlbs(struct list_
  /*
   * migrate_pages_batch() first unmaps folios in the from list as many as
   * possible, then move the unmapped folios.
 + *
 + * We only batch migration if mode == MIGRATE_ASYNC to avoid to wait a
 + * lock or bit when we have locked more than one folio.  Which may cause
 + * deadlock (e.g., for loop device).  So, if mode != MIGRATE_ASYNC, the
 + * length of the from list must be <= 1.
   */
  static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page,
  		free_page_t put_new_page, unsigned long private,
 @@ -1640,11 +1634,11 @@ static int migrate_pages_batch(struct li
  	LIST_HEAD(dst_folios);
  	bool nosplit = (reason == MR_NUMA_MISPLACED);
  	bool no_split_folio_counting = false;
 -	bool avoid_force_lock;

 +	VM_WARN_ON_ONCE(mode != MIGRATE_ASYNC &&
 +			!list_empty(from) && !list_is_singular(from));
  retry:
  	rc_saved = 0;
 -	avoid_force_lock = false;
  	retry = 1;
  	for (pass = 0;
  	     pass < NR_MAX_MIGRATE_PAGES_RETRY && (retry || large_retry);
 @@ -1689,15 +1683,14 @@ retry:
  			}

  			rc = migrate_folio_unmap(get_new_page, put_new_page, private,
 -						 folio, &dst, pass > 2, avoid_force_lock,
 -						 mode, reason, ret_folios);
 +						 folio, &dst, pass > 2, mode,
 +						 reason, ret_folios);
  			/*
  			 * The rules are:
  			 *	Success: folio will be freed
  			 *	Unmap: folio will be put on unmap_folios list,
  			 *	       dst folio put on dst_folios list
  			 *	-EAGAIN: stay on the from list
 -			 *	-EDEADLOCK: stay on the from list
  			 *	-ENOMEM: stay on the from list
  			 *	Other errno: put on ret_folios list
  			 */
 @@ -1749,14 +1742,6 @@ retry:
  					goto out;
  				else
  					goto move;
 -			case -EDEADLOCK:
 -				/*
 -				 * The folio cannot be locked for potential deadlock.
 -				 * Go move (and unlock) all locked folios.  Then we can
 -				 * try again.
 -				 */
 -				rc_saved = rc;
 -				goto move;
  			case -EAGAIN:
  				if (is_large) {
  					large_retry++;
 @@ -1771,11 +1756,6 @@ retry:
  				stats->nr_thp_succeeded += is_thp;
  				break;
  			case MIGRATEPAGE_UNMAP:
 -				/*
 -				 * We have locked some folios, don't force lock
 -				 * to avoid deadlock.
 -				 */
 -				avoid_force_lock = true;
  				list_move_tail(&folio->lru, &unmap_folios);
  				list_add_tail(&dst->lru, &dst_folios);
  				break;
 @@ -1900,17 +1880,15 @@ out:
  		 */
  		list_splice_init(from, ret_folios);
  		list_splice_init(&split_folios, from);
 +		/*
 +		 * Force async mode to avoid to wait lock or bit when we have
 +		 * locked more than one folios.
 +		 */
 +		mode = MIGRATE_ASYNC;
  		no_split_folio_counting = true;
  		goto retry;
  	}

 -	/*
 -	 * We have unlocked all locked folios, so we can force lock now, let's
 -	 * try again.
 -	 */
 -	if (rc == -EDEADLOCK)
 -		goto retry;
 -
  	return rc;
  }

 @@ -1945,7 +1923,7 @@ int migrate_pages(struct list_head *from
  		enum migrate_mode mode, int reason, unsigned int *ret_succeeded)
  {
  	int rc, rc_gather;
 -	int nr_pages;
 +	int nr_pages, batch;
  	struct folio *folio, *folio2;
  	LIST_HEAD(folios);
  	LIST_HEAD(ret_folios);
 @@ -1959,6 +1937,11 @@ int migrate_pages(struct list_head *from
  				     mode, reason, &stats, &ret_folios);
  	if (rc_gather < 0)
  		goto out;
 +
 +	if (mode == MIGRATE_ASYNC)
 +		batch = NR_MAX_BATCHED_MIGRATION;
 +	else
 +		batch = 1;
  again:
  	nr_pages = 0;
  	list_for_each_entry_safe(folio, folio2, from, lru) {
 @@ -1969,11 +1952,11 @@ again:
  		}

  		nr_pages += folio_nr_pages(folio);
 -		if (nr_pages > NR_MAX_BATCHED_MIGRATION)
 +		if (nr_pages >= batch)
  			break;
  	}
 -	if (nr_pages > NR_MAX_BATCHED_MIGRATION)
 -		list_cut_before(&folios, from, &folio->lru);
 +	if (nr_pages >= batch)
 +		list_cut_before(&folios, from, &folio2->lru);
  	else
  		list_splice_init(from, &folios);
  	rc = migrate_pages_batch(&folios, get_new_page, put_new_page, private,
 _
	From: Huang Ying <ying.huang@intel.com>
	Subject: migrate_pages: fix deadlock in batched migration
	Date: Fri, 3 Mar 2023 11:01:53 +0800

	Patch series "migrate_pages: fix deadlock in batched synchronous
	migration", v2.

	Two deadlock bugs were reported for the migrate_pages() batching series.
	Thanks Hugh and Pengfei. Analysis shows that if we have locked some other
	folios except the one we are migrating, it's not safe in general to wait
	synchronously, for example, to wait the writeback to complete or wait to
	lock the buffer head.

	So 1/3 fixes the deadlock in a simple way, where the batching support for
	the synchronous migration is disabled. The change is straightforward and
	easy to be understood. While 3/3 re-introduce the batching for
	synchronous migration via trying to migrate asynchronously in batch
	optimistically, then fall back to migrate synchronously one by one for
	fail-to-migrate folios. Test shows that this can restore the TLB flushing
	batching performance for synchronous migration effectively.


	This patch (of 3):

	Two deadlock bugs were reported for the migrate_pages() batching series.
	Thanks Hugh and Pengfei! For example, in the following deadlock trace
	snippet,

	INFO: task kworker/u4:0:9 blocked for more than 147 seconds.
	Not tainted 6.2.0-rc4-kvm+ #1314
	"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
	task:kworker/u4:0 state:D stack:0 pid:9 ppid:2 flags:0x00004000
	Workqueue: loop4 loop_rootcg_workfn
	Call Trace:
	<TASK>
	__schedule+0x43b/0xd00
	schedule+0x6a/0xf0
	io_schedule+0x4a/0x80
	folio_wait_bit_common+0x1b5/0x4e0
	? __pfx_wake_page_function+0x10/0x10
	__filemap_get_folio+0x73d/0x770
	shmem_get_folio_gfp+0x1fd/0xc80
	shmem_write_begin+0x91/0x220
	generic_perform_write+0x10e/0x2e0
	__generic_file_write_iter+0x17e/0x290
	? generic_write_checks+0x12b/0x1a0
	generic_file_write_iter+0x97/0x180
	? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
	do_iter_readv_writev+0x13c/0x210
	? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
	do_iter_write+0xf6/0x330
	vfs_iter_write+0x46/0x70
	loop_process_work+0x723/0xfe0
	loop_rootcg_workfn+0x28/0x40
	process_one_work+0x3cc/0x8d0
	worker_thread+0x66/0x630
	? __pfx_worker_thread+0x10/0x10
	kthread+0x153/0x190
	? __pfx_kthread+0x10/0x10
	ret_from_fork+0x29/0x50
	</TASK>

	INFO: task repro:1023 blocked for more than 147 seconds.
	Not tainted 6.2.0-rc4-kvm+ #1314
	"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
	task:repro state:D stack:0 pid:1023 ppid:360 flags:0x00004004
	Call Trace:
	<TASK>
	__schedule+0x43b/0xd00
	schedule+0x6a/0xf0
	io_schedule+0x4a/0x80
	folio_wait_bit_common+0x1b5/0x4e0
	? compaction_alloc+0x77/0x1150
	? __pfx_wake_page_function+0x10/0x10
	folio_wait_bit+0x30/0x40
	folio_wait_writeback+0x2e/0x1e0
	migrate_pages_batch+0x555/0x1ac0
	? __pfx_compaction_alloc+0x10/0x10
	? __pfx_compaction_free+0x10/0x10
	? __this_cpu_preempt_check+0x17/0x20
	? lock_is_held_type+0xe6/0x140
	migrate_pages+0x100e/0x1180
	? __pfx_compaction_free+0x10/0x10
	? __pfx_compaction_alloc+0x10/0x10
	compact_zone+0xe10/0x1b50
	? lock_is_held_type+0xe6/0x140
	? check_preemption_disabled+0x80/0xf0
	compact_node+0xa3/0x100
	? __sanitizer_cov_trace_const_cmp8+0x1c/0x30
	? _find_first_bit+0x7b/0x90
	sysctl_compaction_handler+0x5d/0xb0
	proc_sys_call_handler+0x29d/0x420
	proc_sys_write+0x2b/0x40
	vfs_write+0x3a3/0x780
	ksys_write+0xb7/0x180
	__x64_sys_write+0x26/0x30
	do_syscall_64+0x3b/0x90
	entry_SYSCALL_64_after_hwframe+0x72/0xdc
	RIP: 0033:0x7f3a2471f59d
	RSP: 002b:00007ffe567f7288 EFLAGS: 00000217 ORIG_RAX: 0000000000000001
	RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3a2471f59d
	RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000005
	RBP: 00007ffe567f72a0 R08: 0000000000000010 R09: 0000000000000010
	R10: 0000000000000010 R11: 0000000000000217 R12: 00000000004012e0
	R13: 00007ffe567f73e0 R14: 0000000000000000 R15: 0000000000000000
	</TASK>

	The page migration task has held the lock of the shmem folio A, and is
	waiting the writeback of the folio B of the file system on the loop block
	device to complete. While the loop worker task which writes back the
	folio B is waiting to lock the shmem folio A, because the folio A backs
	the folio B in the loop device. Thus deadlock is triggered.

	In general, if we have locked some other folios except the one we are
	migrating, it's not safe to wait synchronously, for example, to wait the
	writeback to complete or wait to lock the buffer head.

	To fix the deadlock, in this patch, we avoid to batch the page migration
	except for MIGRATE_ASYNC mode. In MIGRATE_ASYNC mode, synchronous waiting
	is avoided.

	The fix can be improved further. We will do that as soon as possible.

	Link: https://lkml.kernel.org/r/20230303030155.160983-1-ying.huang@intel.com
	Link: https://lore.kernel.org/linux-mm/87a6c8c-c5c1-67dc-1e32-eb30831d6e3d@google.com/
	Link: https://lore.kernel.org/linux-mm/874jrg7kke.fsf@yhuang6-desk2.ccr.corp.intel.com/
	Link: https://lore.kernel.org/linux-mm/20230227110614.dngdub2j3exr6dfp@quack3/
	Link: https://lkml.kernel.org/r/20230303030155.160983-2-ying.huang@intel.com
	Fixes: 5dfab109d519 ("migrate_pages: batch _unmap and _move")
	Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
	Reported-by: Hugh Dickins <hughd@google.com>
	Reported-by: "Xu, Pengfei" <pengfei.xu@intel.com>
	Cc: Jan Kara <jack@suse.cz>
	Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
	Cc: Christoph Hellwig <hch@lst.de>
	Cc: Stefan Roesch <shr@devkernel.io>
	Cc: Tejun Heo <tj@kernel.org>
	Cc: Xin Hao <xhao@linux.alibaba.com>
	Cc: Zi Yan <ziy@nvidia.com>
	Cc: Yang Shi <shy828301@gmail.com>
	Cc: Matthew Wilcox <willy@infradead.org>
	Cc: Mike Kravetz <mike.kravetz@oracle.com>
	Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
	---


	--- a/mm/migrate.c~migrate_pages-fix-deadlock-in-batched-migration
	+++ a/mm/migrate.c
	@@ -1112,7 +1112,7 @@ static void migrate_folio_done(struct fo
	/* Obtain the lock on page, remove all ptes. */
	static int migrate_folio_unmap(new_page_t get_new_page, free_page_t put_new_page,
	unsigned long private, struct folio *src,
	- struct folio **dstp, int force, bool avoid_force_lock,
	+ struct folio **dstp, int force,
	enum migrate_mode mode, enum migrate_reason reason,
	struct list_head *ret)
	{
	@@ -1163,17 +1163,6 @@ static int migrate_folio_unmap(new_page_
	if (current->flags & PF_MEMALLOC)
	goto out;

	- /*
	- * We have locked some folios and are going to wait to lock
	- * this folio. To avoid a potential deadlock, let's bail
	- * out and not do that. The locked folios will be moved and
	- * unlocked, then we can wait to lock this folio.
	- */
	- if (avoid_force_lock) {
	- rc = -EDEADLOCK;
	- goto out;
	- }
	-
	folio_lock(src);
	}
	locked = true;
	@@ -1253,7 +1242,7 @@ static int migrate_folio_unmap(new_page_
	/* Establish migration ptes */
	VM_BUG_ON_FOLIO(folio_test_anon(src) &&
	!folio_test_ksm(src) && !anon_vma, src);
	- try_to_migrate(src, TTU_BATCH_FLUSH);
	+ try_to_migrate(src, mode == MIGRATE_ASYNC ? TTU_BATCH_FLUSH : 0);
	page_was_mapped = 1;
	}

	@@ -1267,7 +1256,7 @@ out:
	* A folio that has not been unmapped will be restored to
	* right list unless we want to retry.
	*/
	- if (rc == -EAGAIN \|\| rc == -EDEADLOCK)
	+ if (rc == -EAGAIN)
	ret = NULL;

	migrate_folio_undo_src(src, page_was_mapped, anon_vma, locked, ret);
	@@ -1618,6 +1607,11 @@ static int migrate_hugetlbs(struct list_
	/*
	* migrate_pages_batch() first unmaps folios in the from list as many as
	* possible, then move the unmapped folios.
	+ *
	+ * We only batch migration if mode == MIGRATE_ASYNC to avoid to wait a
	+ * lock or bit when we have locked more than one folio. Which may cause
	+ * deadlock (e.g., for loop device). So, if mode != MIGRATE_ASYNC, the
	+ * length of the from list must be <= 1.
	*/
	static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page,
	free_page_t put_new_page, unsigned long private,
	@@ -1640,11 +1634,11 @@ static int migrate_pages_batch(struct li
	LIST_HEAD(dst_folios);
	bool nosplit = (reason == MR_NUMA_MISPLACED);
	bool no_split_folio_counting = false;
	- bool avoid_force_lock;

	+ VM_WARN_ON_ONCE(mode != MIGRATE_ASYNC &&
	+ !list_empty(from) && !list_is_singular(from));
	retry:
	rc_saved = 0;
	- avoid_force_lock = false;
	retry = 1;
	for (pass = 0;
	pass < NR_MAX_MIGRATE_PAGES_RETRY && (retry \|\| large_retry);
	@@ -1689,15 +1683,14 @@ retry:
	}

	rc = migrate_folio_unmap(get_new_page, put_new_page, private,
	- folio, &dst, pass > 2, avoid_force_lock,
	- mode, reason, ret_folios);
	+ folio, &dst, pass > 2, mode,
	+ reason, ret_folios);
	/*
	* The rules are:
	* Success: folio will be freed
	* Unmap: folio will be put on unmap_folios list,
	* dst folio put on dst_folios list
	* -EAGAIN: stay on the from list
	- * -EDEADLOCK: stay on the from list
	* -ENOMEM: stay on the from list
	* Other errno: put on ret_folios list
	*/
	@@ -1749,14 +1742,6 @@ retry:
	goto out;
	else
	goto move;
	- case -EDEADLOCK:
	- /*
	- * The folio cannot be locked for potential deadlock.
	- * Go move (and unlock) all locked folios. Then we can
	- * try again.
	- */
	- rc_saved = rc;
	- goto move;
	case -EAGAIN:
	if (is_large) {
	large_retry++;
	@@ -1771,11 +1756,6 @@ retry:
	stats->nr_thp_succeeded += is_thp;
	break;
	case MIGRATEPAGE_UNMAP:
	- /*
	- * We have locked some folios, don't force lock
	- * to avoid deadlock.
	- */
	- avoid_force_lock = true;
	list_move_tail(&folio->lru, &unmap_folios);
	list_add_tail(&dst->lru, &dst_folios);
	break;
	@@ -1900,17 +1880,15 @@ out:
	*/
	list_splice_init(from, ret_folios);
	list_splice_init(&split_folios, from);
	+ /*
	+ * Force async mode to avoid to wait lock or bit when we have
	+ * locked more than one folios.
	+ */
	+ mode = MIGRATE_ASYNC;
	no_split_folio_counting = true;
	goto retry;
	}

	- /*
	- * We have unlocked all locked folios, so we can force lock now, let's
	- * try again.
	- */
	- if (rc == -EDEADLOCK)
	- goto retry;
	-
	return rc;
	}

	@@ -1945,7 +1923,7 @@ int migrate_pages(struct list_head *from
	enum migrate_mode mode, int reason, unsigned int *ret_succeeded)
	{
	int rc, rc_gather;
	- int nr_pages;
	+ int nr_pages, batch;
	struct folio folio, folio2;
	LIST_HEAD(folios);
	LIST_HEAD(ret_folios);
	@@ -1959,6 +1937,11 @@ int migrate_pages(struct list_head *from
	mode, reason, &stats, &ret_folios);
	if (rc_gather < 0)
	goto out;
	+
	+ if (mode == MIGRATE_ASYNC)
	+ batch = NR_MAX_BATCHED_MIGRATION;
	+ else
	+ batch = 1;
	again:
	nr_pages = 0;
	list_for_each_entry_safe(folio, folio2, from, lru) {
	@@ -1969,11 +1952,11 @@ again:
	}

	nr_pages += folio_nr_pages(folio);
	- if (nr_pages > NR_MAX_BATCHED_MIGRATION)
	+ if (nr_pages >= batch)
	break;
	}
	- if (nr_pages > NR_MAX_BATCHED_MIGRATION)
	- list_cut_before(&folios, from, &folio->lru);
	+ if (nr_pages >= batch)
	+ list_cut_before(&folios, from, &folio2->lru);
	else
	list_splice_init(from, &folios);
	rc = migrate_pages_batch(&folios, get_new_page, put_new_page, private,
	_