| From: David Hildenbrand <david@redhat.com> |
| Subject: mm: stop making SPARSEMEM_VMEMMAP user-selectable |
| Date: Mon, 1 Sep 2025 17:03:22 +0200 |
| |
| Patch series "mm: remove nth_page()", v2. |
| |
| As discussed recently with Linus, nth_page() is just nasty and we would |
| like to remove it. |
| |
| To recap, the reason we currently need nth_page() within a folio is |
| because on some kernel configs (SPARSEMEM without SPARSEMEM_VMEMMAP), the |
| memmap is allocated per memory section. |
| |
| While buddy allocations cannot cross memory section boundaries, hugetlb |
| and dax folios can. |
| |
| So crossing a memory section means that "page++" could do the wrong thing. |
| Instead, nth_page() on these problematic configs always goes from |
| page->pfn, to the go from (++pfn)->page, which is rather nasty. |
| |
| Likely, many people have no idea when nth_page() is required and when it |
| might be dropped. |
| |
| We refer to such problematic PFN ranges and "non-contiguous pages". If we |
| only deal with "contiguous pages", there is not need for nth_page(). |
| |
| Besides that "obvious" folio case, we might end up using nth_page() within |
| CMA allocations (again, could span memory sections), and in one corner |
| case (kfence) when processing memblock allocations (again, could span |
| memory sections). |
| |
| So let's handle all that, add sanity checks, and remove nth_page(). |
| |
| Patch #1 -> #5 : stop making SPARSEMEM_VMEMMAP user-selectable + cleanups |
| Patch #6 -> #13 : disallow folios to have non-contiguous pages |
| Patch #14 -> #20 : remove nth_page() usage within folios |
| Patch #22 : disallow CMA allocations of non-contiguous pages |
| Patch #23 -> #33 : sanity+check + remove nth_page() usage within SG entry |
| Patch #34 : sanity-check + remove nth_page() usage in |
| unpin_user_page_range_dirty_lock() |
| Patch #35 : remove nth_page() in kfence |
| Patch #36 : adjust stale comment regarding nth_page |
| Patch #37 : mm: remove nth_page() |
| |
| A lot of this is inspired from the discussion at [1] between Linus, Jason |
| and me, so cudos to them. |
| |
| |
| This patch (of 37): |
| |
| In an ideal world, we wouldn't have to deal with SPARSEMEM without |
| SPARSEMEM_VMEMMAP, but in particular for 32bit SPARSEMEM_VMEMMAP is |
| considered too costly and consequently not supported. |
| |
| However, if an architecture does support SPARSEMEM with SPARSEMEM_VMEMMAP, |
| let's forbid the user to disable VMEMMAP: just like we already do for |
| arm64, s390 and x86. |
| |
| So if SPARSEMEM_VMEMMAP is supported, don't allow to use SPARSEMEM without |
| SPARSEMEM_VMEMMAP. |
| |
| This implies that the option to not use SPARSEMEM_VMEMMAP will now be gone |
| for loongarch, powerpc, riscv and sparc. All architectures only enable |
| SPARSEMEM_VMEMMAP with 64bit support, so there should not really be a big |
| downside to using the VMEMMAP (quite the contrary). |
| |
| This is a preparation for not supporting |
| |
| (1) folio sizes that exceed a single memory section |
| |
| (2) CMA allocations of non-contiguous page ranges |
| |
| in SPARSEMEM without SPARSEMEM_VMEMMAP configs, whereby we want to limit |
| possible impact as much as possible (e.g., gigantic hugetlb page |
| allocations suddenly fails). |
| |
| Link: https://lkml.kernel.org/r/20250901150359.867252-1-david@redhat.com |
| Link: https://lkml.kernel.org/r/20250901150359.867252-2-david@redhat.com |
| Link: https://lore.kernel.org/all/CAHk-=wiCYfNp4AJLBORU-c7ZyRBUp66W2-Et6cdQ4REx-GyQ_A@mail.gmail.com/T/#u [1] |
| Signed-off-by: David Hildenbrand <david@redhat.com> |
| Acked-by: Zi Yan <ziy@nvidia.com> |
| Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> |
| Acked-by: SeongJae Park <sj@kernel.org> |
| Reviewed-by: Wei Yang <richard.weiyang@gmail.com> |
| Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> |
| Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> |
| Cc: Huacai Chen <chenhuacai@kernel.org> |
| Cc: WANG Xuerui <kernel@xen0n.name> |
| Cc: Madhavan Srinivasan <maddy@linux.ibm.com> |
| Cc: Michael Ellerman <mpe@ellerman.id.au> |
| Cc: Nicholas Piggin <npiggin@gmail.com> |
| Cc: Christophe Leroy <christophe.leroy@csgroup.eu> |
| Cc: Paul Walmsley <paul.walmsley@sifive.com> |
| Cc: Palmer Dabbelt <palmer@dabbelt.com> |
| Cc: Albert Ou <aou@eecs.berkeley.edu> |
| Cc: Alexandre Ghiti <alex@ghiti.fr> |
| Cc: "David S. Miller" <davem@davemloft.net> |
| Cc: Andreas Larsson <andreas@gaisler.com> |
| Cc: Alexander Gordeev <agordeev@linux.ibm.com> |
| Cc: Alexander Potapenko <glider@google.com> |
| Cc: Alexandru Elisei <alexandru.elisei@arm.com> |
| Cc: Alex Dubov <oakad@yahoo.com> |
| Cc: Alex Willamson <alex.williamson@redhat.com> |
| Cc: Bart van Assche <bvanassche@acm.org> |
| Cc: Borislav Betkov <bp@alien8.de> |
| Cc: Brendan Jackman <jackmanb@google.com> |
| Cc: Brett Creeley <brett.creeley@amd.com> |
| Cc: Catalin Marinas <catalin.marinas@arm.com> |
| Cc: Christian Borntraeger <borntraeger@linux.ibm.com> |
| Cc: Christoph Lameter (Ampere) <cl@gentwo.org> |
| Cc: Damien Le Maol <dlemoal@kernel.org> |
| Cc: Dave Airlie <airlied@gmail.com> |
| Cc: Dennis Zhou <dennis@kernel.org> |
| Cc: Dmitriy Vyukov <dvyukov@google.com> |
| Cc: Doug Gilbert <dgilbert@interlog.com> |
| Cc: Heiko Carstens <hca@linux.ibm.com> |
| Cc: Herbert Xu <herbert@gondor.apana.org.au> |
| Cc: Ingo Molnar <mingo@redhat.com> |
| Cc: Inki Dae <m.szyprowski@samsung.com> |
| Cc: James Bottomley <james.bottomley@HansenPartnership.com> |
| Cc: Jani Nikula <jani.nikula@linux.intel.com> |
| Cc: Jason A. Donenfeld <jason@zx2c4.com> |
| Cc: Jason Gunthorpe <jgg@nvidia.com> |
| Cc: Jason Gunthorpe <jgg@ziepe.ca> |
| Cc: Jens Axboe <axboe@kernel.dk> |
| Cc: Jesper Nilsson <jesper.nilsson@axis.com> |
| Cc: Johannes Weiner <hannes@cmpxchg.org> |
| Cc: John Hubbard <jhubbard@nvidia.com> |
| Cc: Jonas Lahtinen <joonas.lahtinen@linux.intel.com> |
| Cc: Kevin Tian <kevin.tian@intel.com> |
| Cc: Lars Persson <lars.persson@axis.com> |
| Cc: Linus Torvalds <torvalds@linux-foundation.org> |
| Cc: Marco Elver <elver@google.com> |
| Cc: "Martin K. Petersen" <martin.petersen@oracle.com> |
| Cc: Maxim Levitky <maximlevitsky@gmail.com> |
| Cc: Michal Hocko <mhocko@suse.com> |
| Cc: Muchun Song <muchun.song@linux.dev> |
| Cc: Niklas Cassel <cassel@kernel.org> |
| Cc: Oscar Salvador <osalvador@suse.de> |
| Cc: Pavel Begunkov <asml.silence@gmail.com> |
| Cc: Peter Xu <peterx@redhat.com> |
| Cc: Robin Murohy <robin.murphy@arm.com> |
| Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> |
| Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> |
| Cc: Shuah Khan <shuah@kernel.org> |
| Cc: Suren Baghdasaryan <surenb@google.com> |
| Cc: Sven Schnelle <svens@linux.ibm.com> |
| Cc: Tejun Heo <tj@kernel.org> |
| Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> |
| Cc: Thomas Gleinxer <tglx@linutronix.de> |
| Cc: Tvrtko Ursulin <tursulin@ursulin.net> |
| Cc: Ulf Hansson <ulf.hansson@linaro.org> |
| Cc: Vasily Gorbik <gor@linux.ibm.com> |
| Cc: Vlastimil Babka <vbabka@suse.cz> |
| Cc: Will Deacon <will@kernel.org> |
| Cc: Yishai Hadas <yishaih@nvidia.com> |
| Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
| --- |
| |
| mm/Kconfig | 3 +-- |
| 1 file changed, 1 insertion(+), 2 deletions(-) |
| |
| --- a/mm/Kconfig~mm-stop-making-sparsemem_vmemmap-user-selectable |
| +++ a/mm/Kconfig |
| @@ -412,9 +412,8 @@ config SPARSEMEM_VMEMMAP_ENABLE |
| bool |
| |
| config SPARSEMEM_VMEMMAP |
| - bool "Sparse Memory virtual memmap" |
| + def_bool y |
| depends on SPARSEMEM && SPARSEMEM_VMEMMAP_ENABLE |
| - default y |
| help |
| SPARSEMEM_VMEMMAP uses a virtually mapped memmap to optimise |
| pfn_to_page and page_to_pfn operations. This is the most |
| _ |