| From: David Hildenbrand <david@redhat.com> |
| Subject: mm/rmap: keep mapcount untouched for device-exclusive entries |
| Date: Mon, 10 Feb 2025 20:37:58 +0100 |
| |
| Now that conversion to device-exclusive does no longer perform an rmap |
| walk and all page_vma_mapped_walk() users were taught to properly handle |
| device-exclusive entries, let's treat device-exclusive entries just as if |
| they would be present, similar to how we handle device-private entries |
| already. |
| |
| This fixes swapout/migration/split/hwpoison of folios with |
| device-exclusive entries. |
| |
| We only had to take care of page_vma_mapped_walk() users, because these |
| traditionally assume pte_present(). Other page table walkers already have |
| to handle !pte_present(), and some of them might simply skip them (e.g., |
| MADV_PAGEOUT) if they are not specialized on them. This change doesn't |
| modify the latter. |
| |
| Note that while folios with device-exclusive PTEs can now get migrated, |
| khugepaged will not collapse a THP if there is device-exclusive PTE. |
| Doing so might also not be desired if the device frequently performs |
| atomics to the same page. Similarly, KSM will never merge order-0 folios |
| that are device-exclusive. |
| |
| Link: https://lkml.kernel.org/r/20250210193801.781278-17-david@redhat.com |
| Fixes: b756a3b5e7ea ("mm: device exclusive memory access") |
| Signed-off-by: David Hildenbrand <david@redhat.com> |
| Tested-by: Alistair Popple <apopple@nvidia.com> |
| Cc: Alex Shi <alexs@kernel.org> |
| Cc: Danilo Krummrich <dakr@kernel.org> |
| Cc: Dave Airlie <airlied@gmail.com> |
| Cc: Jann Horn <jannh@google.com> |
| Cc: Jason Gunthorpe <jgg@nvidia.com> |
| Cc: Jerome Glisse <jglisse@redhat.com> |
| Cc: John Hubbard <jhubbard@nvidia.com> |
| Cc: Jonathan Corbet <corbet@lwn.net> |
| Cc: Karol Herbst <kherbst@redhat.com> |
| Cc: Liam Howlett <liam.howlett@oracle.com> |
| Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> |
| Cc: Lyude <lyude@redhat.com> |
| Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> |
| Cc: Oleg Nesterov <oleg@redhat.com> |
| Cc: Pasha Tatashin <pasha.tatashin@soleen.com> |
| Cc: Peter Xu <peterx@redhat.com> |
| Cc: Peter Zijlstra (Intel) <peterz@infradead.org> |
| Cc: SeongJae Park <sj@kernel.org> |
| Cc: Simona Vetter <simona.vetter@ffwll.ch> |
| Cc: Vlastimil Babka <vbabka@suse.cz> |
| Cc: Yanteng Si <si.yanteng@linux.dev> |
| Cc: Barry Song <v-songbaohua@oppo.com> |
| Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
| --- |
| |
| mm/memory.c | 17 +---------------- |
| mm/rmap.c | 7 ------- |
| 2 files changed, 1 insertion(+), 23 deletions(-) |
| |
| --- a/mm/memory.c~mm-rmap-keep-mapcount-untouched-for-device-exclusive-entries |
| +++ a/mm/memory.c |
| @@ -741,20 +741,6 @@ static void restore_exclusive_pte(struct |
| |
| VM_BUG_ON_FOLIO(pte_write(pte) && (!folio_test_anon(folio) && |
| PageAnonExclusive(page)), folio); |
| - |
| - /* |
| - * No need to take a page reference as one was already |
| - * created when the swap entry was made. |
| - */ |
| - if (folio_test_anon(folio)) |
| - folio_add_anon_rmap_pte(folio, page, vma, address, RMAP_NONE); |
| - else |
| - /* |
| - * Currently device exclusive access only supports anonymous |
| - * memory so the entry shouldn't point to a filebacked page. |
| - */ |
| - WARN_ON_ONCE(1); |
| - |
| set_pte_at(vma->vm_mm, address, ptep, pte); |
| |
| /* |
| @@ -1626,8 +1612,7 @@ static inline int zap_nonpresent_ptes(st |
| */ |
| WARN_ON_ONCE(!vma_is_anonymous(vma)); |
| rss[mm_counter(folio)]--; |
| - if (is_device_private_entry(entry)) |
| - folio_remove_rmap_pte(folio, page, vma); |
| + folio_remove_rmap_pte(folio, page, vma); |
| folio_put(folio); |
| } else if (!non_swap_entry(entry)) { |
| /* Genuine swap entries, hence a private anon pages */ |
| --- a/mm/rmap.c~mm-rmap-keep-mapcount-untouched-for-device-exclusive-entries |
| +++ a/mm/rmap.c |
| @@ -2511,13 +2511,6 @@ struct page *make_device_exclusive(struc |
| /* The pte is writable, uffd-wp does not apply. */ |
| set_pte_at(mm, addr, fw.ptep, swp_pte); |
| |
| - /* |
| - * TODO: The device-exclusive PFN swap PTE holds a folio reference but |
| - * does not count as a mapping (mapcount), which is wrong and must be |
| - * fixed, otherwise RMAP walks don't behave as expected. |
| - */ |
| - folio_remove_rmap_pte(folio, page, vma); |
| - |
| folio_walk_end(&fw, vma); |
| mmu_notifier_invalidate_range_end(&range); |
| *foliop = folio; |
| _ |