| From: Mel Gorman <mgorman@techsingularity.net> |
| Date: Thu, 1 Oct 2015 15:36:57 -0700 |
| Subject: mm: hugetlbfs: skip shared VMAs when unmapping private pages to |
| satisfy a fault |
| |
| commit 2f84a8990ebbe235c59716896e017c6b2ca1200f upstream. |
| |
| SunDong reported the following on |
| |
| https://bugzilla.kernel.org/show_bug.cgi?id=103841 |
| |
| I think I find a linux bug, I have the test cases is constructed. I |
| can stable recurring problems in fedora22(4.0.4) kernel version, |
| arch for x86_64. I construct transparent huge page, when the parent |
| and child process with MAP_SHARE, MAP_PRIVATE way to access the same |
| huge page area, it has the opportunity to lead to huge page copy on |
| write failure, and then it will munmap the child corresponding mmap |
| area, but then the child mmap area with VM_MAYSHARE attributes, child |
| process munmap this area can trigger VM_BUG_ON in set_vma_resv_flags |
| functions (vma - > vm_flags & VM_MAYSHARE). |
| |
| There were a number of problems with the report (e.g. it's hugetlbfs that |
| triggers this, not transparent huge pages) but it was fundamentally |
| correct in that a VM_BUG_ON in set_vma_resv_flags() can be triggered that |
| looks like this |
| |
| vma ffff8804651fd0d0 start 00007fc474e00000 end 00007fc475e00000 |
| next ffff8804651fd018 prev ffff8804651fd188 mm ffff88046b1b1800 |
| prot 8000000000000027 anon_vma (null) vm_ops ffffffff8182a7a0 |
| pgoff 0 file ffff88106bdb9800 private_data (null) |
| flags: 0x84400fb(read|write|shared|mayread|maywrite|mayexec|mayshare|dontexpand|hugetlb) |
| ------------ |
| kernel BUG at mm/hugetlb.c:462! |
| SMP |
| Modules linked in: xt_pkttype xt_LOG xt_limit [..] |
| CPU: 38 PID: 26839 Comm: map Not tainted 4.0.4-default #1 |
| Hardware name: Dell Inc. PowerEdge R810/0TT6JF, BIOS 2.7.4 04/26/2012 |
| set_vma_resv_flags+0x2d/0x30 |
| |
| The VM_BUG_ON is correct because private and shared mappings have |
| different reservation accounting but the warning clearly shows that the |
| VMA is shared. |
| |
| When a private COW fails to allocate a new page then only the process |
| that created the VMA gets the page -- all the children unmap the page. |
| If the children access that data in the future then they get killed. |
| |
| The problem is that the same file is mapped shared and private. During |
| the COW, the allocation fails, the VMAs are traversed to unmap the other |
| private pages but a shared VMA is found and the bug is triggered. This |
| patch identifies such VMAs and skips them. |
| |
| Signed-off-by: Mel Gorman <mgorman@techsingularity.net> |
| Reported-by: SunDong <sund_sky@126.com> |
| Reviewed-by: Michal Hocko <mhocko@suse.com> |
| Cc: Andrea Arcangeli <aarcange@redhat.com> |
| Cc: Hugh Dickins <hughd@google.com> |
| Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> |
| Cc: David Rientjes <rientjes@google.com> |
| Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> |
| Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
| Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
| Signed-off-by: Ben Hutchings <ben@decadent.org.uk> |
| --- |
| mm/hugetlb.c | 8 ++++++++ |
| 1 file changed, 8 insertions(+) |
| |
| --- a/mm/hugetlb.c |
| +++ b/mm/hugetlb.c |
| @@ -2502,6 +2502,14 @@ static int unmap_ref_private(struct mm_s |
| continue; |
| |
| /* |
| + * Shared VMAs have their own reserves and do not affect |
| + * MAP_PRIVATE accounting but it is possible that a shared |
| + * VMA is using the same page so check and skip such VMAs. |
| + */ |
| + if (iter_vma->vm_flags & VM_MAYSHARE) |
| + continue; |
| + |
| + /* |
| * Unmap the page from other VMAs without their own reserves. |
| * They get marked to be SIGKILLed if they fault in these |
| * areas. This is because a future no-page fault on this VMA |