| From: Nico Pache <npache@redhat.com> |
| Subject: Documentation: mm: update the admin guide for mTHP collapse |
| Date: Tue, 19 Aug 2025 08:17:42 -0600 |
| |
| Now that we can collapse to mTHPs lets update the admin guide to reflect |
| these changes and provide proper guidence on how to utilize it. |
| |
| Link: https://lkml.kernel.org/r/20250819141742.626517-1-npache@redhat.com |
| Signed-off-by: Nico Pache <npache@redhat.com> |
| Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com> |
| Cc: Andrea Arcangeli <aarcange@redhat.com> |
| Cc: Anshuman Khandual <anshuman.khandual@arm.com> |
| Cc: Baolin Wang <baolin.wang@linux.alibaba.com> |
| Cc: Barry Song <baohua@kernel.org> |
| Cc: Catalin Marinas <catalin.marinas@arm.com> |
| Cc: Christoph Lameter (Ampere) <cl@gentwo.org> |
| Cc: David Hildenbrand <david@redhat.com> |
| Cc: David Rientjes <rientjes@google.com> |
| Cc: Dev Jain <dev.jain@arm.com> |
| Cc: Hugh Dickins <hughd@google.com> |
| Cc: Jan Kara <jack@suse.cz> |
| Cc: Johannes Weiner <hannes@cmpxchg.org> |
| Cc: Jonathan Corbet <corbet@lwn.net> |
| Cc: Kefeng Wang <wangkefeng.wang@huawei.com> |
| Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com> |
| Cc: Liam Howlett <liam.howlett@oracle.com> |
| Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> |
| Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> |
| Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> |
| Cc: Matthew Wilcox (Oracle) <willy@infradead.org> |
| Cc: Michal Hocko <mhocko@suse.com> |
| Cc: Nanyong Sun <sunnanyong@huawei.com> |
| Cc: Peter Xu <peterx@redhat.com> |
| Cc: Rafael Aquini <raquini@redhat.com> |
| Cc: Randy Dunlap <rdunlap@infradead.org> |
| Cc: Reported-by:Takashi Iwai <tiwai@suse.de> |
| Cc: Ryan Roberts <ryan.roberts@arm.com> |
| Cc: Steven Rostedt <rostedt@goodmis.org> |
| Cc: Suren Baghdasaryan <surenb@google.com> |
| Cc: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> |
| Cc: Usama Arif <usamaarif642@gmail.com> |
| Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> |
| Cc: Will Deacon <will@kernel.org> |
| Cc: Yang Shi <yang@os.amperecomputing.com> |
| Cc: Zach O'Keefe <zokeefe@google.com> |
| Cc: Zi Yan <ziy@nvidia.com> |
| Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
| --- |
| |
| Documentation/admin-guide/mm/transhuge.rst | 19 +++++++++++++------ |
| 1 file changed, 13 insertions(+), 6 deletions(-) |
| |
| --- a/Documentation/admin-guide/mm/transhuge.rst~documentation-mm-update-the-admin-guide-for-mthp-collapse |
| +++ a/Documentation/admin-guide/mm/transhuge.rst |
| @@ -63,7 +63,7 @@ often. |
| THP can be enabled system wide or restricted to certain tasks or even |
| memory ranges inside task's address space. Unless THP is completely |
| disabled, there is ``khugepaged`` daemon that scans memory and |
| -collapses sequences of basic pages into PMD-sized huge pages. |
| +collapses sequences of basic pages into huge pages. |
| |
| The THP behaviour is controlled via :ref:`sysfs <thp_sysfs>` |
| interface and using madvise(2) and prctl(2) system calls. |
| @@ -149,6 +149,18 @@ hugepage sizes have enabled="never". If |
| sizes, the kernel will select the most appropriate enabled size for a |
| given allocation. |
| |
| +khugepaged uses max_ptes_none scaled to the order of the enabled mTHP size |
| +to determine collapses. When using mTHPs it's recommended to set |
| +max_ptes_none low-- ideally less than HPAGE_PMD_NR / 2 (255 on 4k page |
| +size). This will prevent undesired "creep" behavior that leads to |
| +continuously collapsing to the largest mTHP size; when we collapse, we are |
| +bringing in new non-zero pages that will, on a subsequent scan, cause the |
| +max_ptes_none check of the +1 order to always be satisfied. By limiting |
| +this to less than half the current order, we make sure we don't cause this |
| +feedback loop. max_ptes_shared and max_ptes_swap have no effect when |
| +collapsing to a mTHP, and mTHP collapse will fail on shared or swapped out |
| +pages. |
| + |
| It's also possible to limit defrag efforts in the VM to generate |
| anonymous hugepages in case they're not immediately free to madvise |
| regions or to never try to defrag memory and simply fallback to regular |
| @@ -264,11 +276,6 @@ support the following arguments:: |
| Khugepaged controls |
| ------------------- |
| |
| -.. note:: |
| - khugepaged currently only searches for opportunities to collapse to |
| - PMD-sized THP and no attempt is made to collapse to other THP |
| - sizes. |
| - |
| khugepaged runs usually at low frequency so while one may not want to |
| invoke defrag algorithms synchronously during the page faults, it |
| should be worth invoking defrag at least in khugepaged. However it's |
| _ |