| From 75ba5fda5962bf3a76e81b133634091d8a620d14 Mon Sep 17 00:00:00 2001 |
| From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> |
| Date: Thu, 13 Apr 2017 14:56:26 -0700 |
| Subject: [PATCH] thp: fix MADV_DONTNEED vs. MADV_FREE race |
| |
| commit 58ceeb6bec86d9140f9d91d71a710e963523d063 upstream. |
| |
| Both MADV_DONTNEED and MADV_FREE handled with down_read(mmap_sem). |
| |
| It's critical to not clear pmd intermittently while handling MADV_FREE |
| to avoid race with MADV_DONTNEED: |
| |
| CPU0: CPU1: |
| madvise_free_huge_pmd() |
| pmdp_huge_get_and_clear_full() |
| madvise_dontneed() |
| zap_pmd_range() |
| pmd_trans_huge(*pmd) == 0 (without ptl) |
| // skip the pmd |
| set_pmd_at(); |
| // pmd is re-established |
| |
| It results in MADV_DONTNEED skipping the pmd, leaving it not cleared. |
| It violates MADV_DONTNEED interface and can result is userspace |
| misbehaviour. |
| |
| Basically it's the same race as with numa balancing in |
| change_huge_pmd(), but a bit simpler to mitigate: we don't need to |
| preserve dirty/young flags here due to MADV_FREE functionality. |
| |
| [kirill.shutemov@linux.intel.com: Urgh... Power is special again] |
| Link: http://lkml.kernel.org/r/20170303102636.bhd2zhtpds4mt62a@black.fi.intel.com |
| Link: http://lkml.kernel.org/r/20170302151034.27829-4-kirill.shutemov@linux.intel.com |
| Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
| Acked-by: Minchan Kim <minchan@kernel.org> |
| Cc: Minchan Kim <minchan@kernel.org> |
| Cc: Andrea Arcangeli <aarcange@redhat.com> |
| Cc: Hillf Danton <hillf.zj@alibaba-inc.com> |
| Cc: <stable@vger.kernel.org> |
| Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
| Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
| Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> |
| |
| diff --git a/mm/huge_memory.c b/mm/huge_memory.c |
| index 9036d6067df4..14c9374c6466 100644 |
| --- a/mm/huge_memory.c |
| +++ b/mm/huge_memory.c |
| @@ -1320,8 +1320,7 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, |
| deactivate_page(page); |
| |
| if (pmd_young(orig_pmd) || pmd_dirty(orig_pmd)) { |
| - orig_pmd = pmdp_huge_get_and_clear_full(tlb->mm, addr, pmd, |
| - tlb->fullmm); |
| + pmdp_invalidate(vma, addr, pmd); |
| orig_pmd = pmd_mkold(orig_pmd); |
| orig_pmd = pmd_mkclean(orig_pmd); |
| |
| -- |
| 2.12.0 |
| |