patches/old/hugetlb-prioritize-surplus-allocation-from-current-node.patch - pub/scm/linux/kernel/git/akpm/25-new - Git at Google

 From: Koichiro Den <koichiro.den@canonical.com>
 Subject: hugetlb: prioritize surplus allocation from current node
 Date: Thu, 5 Dec 2024 01:55:03 +0900

 Previously, surplus allocations triggered by mmap were typically made from
 the node where the process was running.  On a page fault, the area was
 reliably dequeued from the hugepage_freelists for that node.  However,
 since commit 003af997c8a9 ("hugetlb: force allocating surplus hugepages on
 mempolicy allowed nodes"), dequeue_hugetlb_folio_vma() may fall back to
 other nodes unnecessarily even if there is no MPOL_BIND policy, causing
 folios to be dequeued from nodes other than the current one.

 Also, allocating from the node where the current process is running is
 likely to result in a performance win, as mmap-ing processes often touch
 the area not so long after allocation.  This change minimizes surprises
 for users relying on the previous behavior while maintaining the benefit
 introduced by the commit.

 So, prioritize the node the current process is running on when possible.

 Link: https://lkml.kernel.org/r/20241204165503.628784-1-koichiro.den@canonical.com
 Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
 Acked-by: Aristeu Rozanski <aris@ruivo.org>
 Cc: Aristeu Rozanski <aris@redhat.com>
 Cc: David Hildenbrand <david@redhat.com>
 Cc: Muchun Song <muchun.song@linux.dev>
 Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
 Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
 ---

  mm/hugetlb.c |   20 +++++++++++++++++---
  1 file changed, 17 insertions(+), 3 deletions(-)

 --- a/mm/hugetlb.c~hugetlb-prioritize-surplus-allocation-from-current-node
 +++ a/mm/hugetlb.c
 @@ -2463,7 +2463,13 @@ static int gather_surplus_pages(struct h
  	long needed, allocated;
  	bool alloc_ok = true;
  	int node;
 -	nodemask_t *mbind_nodemask = policy_mbind_nodemask(htlb_alloc_mask(h));
 +	nodemask_t *mbind_nodemask, alloc_nodemask;
 +
 +	mbind_nodemask = policy_mbind_nodemask(htlb_alloc_mask(h));
 +	if (mbind_nodemask)
 +		nodes_and(alloc_nodemask, *mbind_nodemask, cpuset_current_mems_allowed);
 +	else
 +		alloc_nodemask = cpuset_current_mems_allowed;

  	lockdep_assert_held(&hugetlb_lock);
  	needed = (h->resv_huge_pages + delta) - h->free_huge_pages;
 @@ -2479,8 +2485,16 @@ retry:
  	spin_unlock_irq(&hugetlb_lock);
  	for (i = 0; i < needed; i++) {
  		folio = NULL;
 -		for_each_node_mask(node, cpuset_current_mems_allowed) {
 -			if (!mbind_nodemask || node_isset(node, *mbind_nodemask)) {
 +
 +		/* Prioritize current node */
 +		if (node_isset(numa_mem_id(), alloc_nodemask))
 +			folio = alloc_surplus_hugetlb_folio(h, htlb_alloc_mask(h),
 +					numa_mem_id(), NULL);
 +
 +		if (!folio) {
 +			for_each_node_mask(node, alloc_nodemask) {
 +				if (node == numa_mem_id())
 +					continue;
  				folio = alloc_surplus_hugetlb_folio(h, htlb_alloc_mask(h),
  						node, NULL);
  				if (folio)
 _
	From: Koichiro Den <koichiro.den@canonical.com>
	Subject: hugetlb: prioritize surplus allocation from current node
	Date: Thu, 5 Dec 2024 01:55:03 +0900

	Previously, surplus allocations triggered by mmap were typically made from
	the node where the process was running. On a page fault, the area was
	reliably dequeued from the hugepage_freelists for that node. However,
	since commit 003af997c8a9 ("hugetlb: force allocating surplus hugepages on
	mempolicy allowed nodes"), dequeue_hugetlb_folio_vma() may fall back to
	other nodes unnecessarily even if there is no MPOL_BIND policy, causing
	folios to be dequeued from nodes other than the current one.

	Also, allocating from the node where the current process is running is
	likely to result in a performance win, as mmap-ing processes often touch
	the area not so long after allocation. This change minimizes surprises
	for users relying on the previous behavior while maintaining the benefit
	introduced by the commit.

	So, prioritize the node the current process is running on when possible.

	Link: https://lkml.kernel.org/r/20241204165503.628784-1-koichiro.den@canonical.com
	Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
	Acked-by: Aristeu Rozanski <aris@ruivo.org>
	Cc: Aristeu Rozanski <aris@redhat.com>
	Cc: David Hildenbrand <david@redhat.com>
	Cc: Muchun Song <muchun.song@linux.dev>
	Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
	Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
	---

	mm/hugetlb.c \| 20 +++++++++++++++++---
	1 file changed, 17 insertions(+), 3 deletions(-)

	--- a/mm/hugetlb.c~hugetlb-prioritize-surplus-allocation-from-current-node
	+++ a/mm/hugetlb.c
	@@ -2463,7 +2463,13 @@ static int gather_surplus_pages(struct h
	long needed, allocated;
	bool alloc_ok = true;
	int node;
	- nodemask_t *mbind_nodemask = policy_mbind_nodemask(htlb_alloc_mask(h));
	+ nodemask_t *mbind_nodemask, alloc_nodemask;
	+
	+ mbind_nodemask = policy_mbind_nodemask(htlb_alloc_mask(h));
	+ if (mbind_nodemask)
	+ nodes_and(alloc_nodemask, *mbind_nodemask, cpuset_current_mems_allowed);
	+ else
	+ alloc_nodemask = cpuset_current_mems_allowed;

	lockdep_assert_held(&hugetlb_lock);
	needed = (h->resv_huge_pages + delta) - h->free_huge_pages;
	@@ -2479,8 +2485,16 @@ retry:
	spin_unlock_irq(&hugetlb_lock);
	for (i = 0; i < needed; i++) {
	folio = NULL;
	- for_each_node_mask(node, cpuset_current_mems_allowed) {
	- if (!mbind_nodemask \|\| node_isset(node, *mbind_nodemask)) {
	+
	+ /* Prioritize current node */
	+ if (node_isset(numa_mem_id(), alloc_nodemask))
	+ folio = alloc_surplus_hugetlb_folio(h, htlb_alloc_mask(h),
	+ numa_mem_id(), NULL);
	+
	+ if (!folio) {
	+ for_each_node_mask(node, alloc_nodemask) {
	+ if (node == numa_mem_id())
	+ continue;
	folio = alloc_surplus_hugetlb_folio(h, htlb_alloc_mask(h),
	node, NULL);
	if (folio)
	_