| From 486cf46f3f9be5f2a966016c1a8fe01e32cde09e Mon Sep 17 00:00:00 2001 |
| From: Hugh Dickins <hughd@google.com> |
| Date: Wed, 19 Oct 2011 12:50:35 -0700 |
| Subject: mm: fix race between mremap and removing migration entry |
| MIME-Version: 1.0 |
| Content-Type: text/plain; charset=UTF-8 |
| Content-Transfer-Encoding: 8bit |
| |
| From: Hugh Dickins <hughd@google.com> |
| |
| commit 486cf46f3f9be5f2a966016c1a8fe01e32cde09e upstream. |
| |
| I don't usually pay much attention to the stale "? " addresses in |
| stack backtraces, but this lucky report from Pawel Sikora hints that |
| mremap's move_ptes() has inadequate locking against page migration. |
| |
| 3.0 BUG_ON(!PageLocked(p)) in migration_entry_to_page(): |
| kernel BUG at include/linux/swapops.h:105! |
| RIP: 0010:[<ffffffff81127b76>] [<ffffffff81127b76>] |
| migration_entry_wait+0x156/0x160 |
| [<ffffffff811016a1>] handle_pte_fault+0xae1/0xaf0 |
| [<ffffffff810feee2>] ? __pte_alloc+0x42/0x120 |
| [<ffffffff8112c26b>] ? do_huge_pmd_anonymous_page+0xab/0x310 |
| [<ffffffff81102a31>] handle_mm_fault+0x181/0x310 |
| [<ffffffff81106097>] ? vma_adjust+0x537/0x570 |
| [<ffffffff81424bed>] do_page_fault+0x11d/0x4e0 |
| [<ffffffff81109a05>] ? do_mremap+0x2d5/0x570 |
| [<ffffffff81421d5f>] page_fault+0x1f/0x30 |
| |
| mremap's down_write of mmap_sem, together with i_mmap_mutex or lock, |
| and pagetable locks, were good enough before page migration (with its |
| requirement that every migration entry be found) came in, and enough |
| while migration always held mmap_sem; but not enough nowadays, when |
| there's memory hotremove and compaction. |
| |
| The danger is that move_ptes() lets a migration entry dodge around |
| behind remove_migration_pte()'s back, so it's in the old location when |
| looking at the new, then in the new location when looking at the old. |
| |
| Either mremap's move_ptes() must additionally take anon_vma lock(), or |
| migration's remove_migration_pte() must stop peeking for is_swap_entry() |
| before it takes pagetable lock. |
| |
| Consensus chooses the latter: we prefer to add overhead to migration |
| than to mremapping, which gets used by JVMs and by exec stack setup. |
| |
| Reported-and-tested-by: Paweł Sikora <pluto@agmk.net> |
| Signed-off-by: Hugh Dickins <hughd@google.com> |
| Acked-by: Andrea Arcangeli <aarcange@redhat.com> |
| Acked-by: Mel Gorman <mgorman@suse.de> |
| Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| |
| --- |
| mm/migrate.c | 8 ++++---- |
| 1 file changed, 4 insertions(+), 4 deletions(-) |
| |
| --- a/mm/migrate.c |
| +++ b/mm/migrate.c |
| @@ -120,10 +120,10 @@ static int remove_migration_pte(struct p |
| |
| ptep = pte_offset_map(pmd, addr); |
| |
| - if (!is_swap_pte(*ptep)) { |
| - pte_unmap(ptep); |
| - goto out; |
| - } |
| + /* |
| + * Peek to check is_swap_pte() before taking ptlock? No, we |
| + * can race mremap's move_ptes(), which skips anon_vma lock. |
| + */ |
| |
| ptl = pte_lockptr(mm, pmd); |
| } |