| From: Peter Xu <peterx@redhat.com> |
| Subject: mm/hugetlb: fix pgtable lock on pmd sharing |
| Date: Mon, 12 Jun 2023 12:04:20 -0400 |
| |
| Huge pmd sharing operates on PUD not PMD, huge_pte_lock() is not suitable |
| in this case because it should only work for last level pte changes, while |
| pmd sharing is always one level higher. |
| |
| Meanwhile, here we're locking over the spte pgtable lock which is even not |
| a lock for current mm but someone else's. |
| |
| It seems even racy on operating on the lock, as after put_page() of the |
| spte pgtable page logically the page can be released, so at least the |
| spin_unlock() needs to be done after the put_page(). |
| |
| No report I am aware, I'm not even sure whether it'll just work on taking |
| the spte pmd lock, because while we're holding i_mmap read lock it probably |
| means the vma interval tree is frozen, all pte allocators over this pud |
| entry could always find the specific svma and spte page, so maybe they'll |
| serialize on this spte page lock? Even so, doesn't seem to be expected. |
| It just seems to be an accident of cb900f412154. |
| |
| Fix it with the proper pud lock (which is the mm's page_table_lock). |
| |
| Link: https://lkml.kernel.org/r/20230612160420.809818-1-peterx@redhat.com |
| Fixes: cb900f412154 ("mm, hugetlb: convert hugetlbfs to use split pmd lock") |
| Signed-off-by: Peter Xu <peterx@redhat.com> |
| Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> |
| Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> |
| Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
| --- |
| |
| mm/hugetlb.c | 5 ++--- |
| 1 file changed, 2 insertions(+), 3 deletions(-) |
| |
| --- a/mm/hugetlb.c~mm-hugetlb-fix-pgtable-lock-on-pmd-sharing |
| +++ a/mm/hugetlb.c |
| @@ -7130,7 +7130,6 @@ pte_t *huge_pmd_share(struct mm_struct * |
| unsigned long saddr; |
| pte_t *spte = NULL; |
| pte_t *pte; |
| - spinlock_t *ptl; |
| |
| i_mmap_lock_read(mapping); |
| vma_interval_tree_foreach(svma, &mapping->i_mmap, idx, idx) { |
| @@ -7151,7 +7150,7 @@ pte_t *huge_pmd_share(struct mm_struct * |
| if (!spte) |
| goto out; |
| |
| - ptl = huge_pte_lock(hstate_vma(vma), mm, spte); |
| + spin_lock(&mm->page_table_lock); |
| if (pud_none(*pud)) { |
| pud_populate(mm, pud, |
| (pmd_t *)((unsigned long)spte & PAGE_MASK)); |
| @@ -7159,7 +7158,7 @@ pte_t *huge_pmd_share(struct mm_struct * |
| } else { |
| put_page(virt_to_page(spte)); |
| } |
| - spin_unlock(ptl); |
| + spin_unlock(&mm->page_table_lock); |
| out: |
| pte = (pte_t *)pmd_alloc(mm, pud, addr); |
| i_mmap_unlock_read(mapping); |
| _ |