| From: yangge <yangge1116@126.com> |
| Subject: mm: compaction: skip memory compaction when there are not enough migratable pages |
| Date: Wed, 8 Jan 2025 19:30:54 +0800 |
| |
| There are 4 NUMA nodes on my machine, and each NUMA node has 32GB of |
| memory. I have configured 16GB of CMA memory on each NUMA node, and |
| starting a 32GB virtual machine with device passthrough is extremely slow, |
| taking almost an hour. |
| |
| During the startup of the virtual machine, it will call |
| pin_user_pages_remote(..., FOLL_LONGTERM, ...) to allocate memory. Long |
| term GUP cannot allocate memory from CMA area, so a maximum of 16 GB of |
| no-CMA memory on a NUMA node can be used as virtual machine memory. There |
| is 16GB of free CMA memory on a NUMA node, which is sufficient to pass the |
| order-0 watermark check, causing the __compaction_suitable() function to |
| consistently return true. However, if there aren't enough migratable |
| pages available, performing memory compaction is also meaningless. |
| Besides checking whether the order-0 watermark is met, |
| __compaction_suitable() also needs to determine whether there are |
| sufficient migratable pages available for memory compaction. |
| |
| For costly allocations, because __compaction_suitable() always returns |
| true, __alloc_pages_slowpath() can't exit at the appropriate place, |
| resulting in excessively long virtual machine startup times. |
| |
| Call trace: |
| __alloc_pages_slowpath |
| if (compact_result == COMPACT_SKIPPED || |
| compact_result == COMPACT_DEFERRED) |
| goto nopage; // should exit __alloc_pages_slowpath() from here |
| |
| When the 16G of non-CMA memory on a single node is exhausted, we will |
| fallback to allocating memory on other nodes. In order to quickly |
| fallback to remote nodes, we should skip memory compaction when migratable |
| pages are insufficient. After this fix, it only takes a few tens of |
| seconds to start a 32GB virtual machine with device passthrough |
| functionality. |
| |
| Link: https://lkml.kernel.org/r/1736335854-548-1-git-send-email-yangge1116@126.com |
| Signed-off-by: yangge <yangge1116@126.com> |
| Cc: Baolin Wang <baolin.wang@linux.alibaba.com> |
| Cc: David Hildenbrand <david@redhat.com> |
| Cc: Johannes Weiner <hannes@cmpxchg.org> |
| Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
| --- |
| |
| mm/compaction.c | 20 ++++++++++++++++++++ |
| 1 file changed, 20 insertions(+) |
| |
| --- a/mm/compaction.c~mm-compaction-skip-memory-compaction-when-there-are-not-enough-migratable-pages |
| +++ a/mm/compaction.c |
| @@ -2383,7 +2383,27 @@ static bool __compaction_suitable(struct |
| int highest_zoneidx, |
| unsigned long wmark_target) |
| { |
| + pg_data_t __maybe_unused *pgdat = zone->zone_pgdat; |
| + unsigned long sum, nr_pinned; |
| unsigned long watermark; |
| + |
| + sum = node_page_state(pgdat, NR_INACTIVE_FILE) + |
| + node_page_state(pgdat, NR_INACTIVE_ANON) + |
| + node_page_state(pgdat, NR_ACTIVE_FILE) + |
| + node_page_state(pgdat, NR_ACTIVE_ANON) + |
| + node_page_state(pgdat, NR_UNEVICTABLE); |
| + |
| + nr_pinned = node_page_state(pgdat, NR_FOLL_PIN_ACQUIRED) - |
| + node_page_state(pgdat, NR_FOLL_PIN_RELEASED); |
| + |
| + /* |
| + * Gup-pinned pages are non-migratable. After subtracting these pages, |
| + * we need to check if the remaining pages are sufficient for memory |
| + * compaction. |
| + */ |
| + if ((sum - nr_pinned) < (1 << order)) |
| + return false; |
| + |
| /* |
| * Watermarks for order-0 must be met for compaction to be able to |
| * isolate free pages for migration targets. This means that the |
| _ |