| From 09789e5de18e4e442870b2d700831f5cb802eb05 Mon Sep 17 00:00:00 2001 |
| From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> |
| Date: Tue, 5 May 2015 16:23:35 -0700 |
| Subject: mm/memory-failure: call shake_page() when error hits thp tail page |
| |
| From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> |
| |
| commit 09789e5de18e4e442870b2d700831f5cb802eb05 upstream. |
| |
| Currently memory_failure() calls shake_page() to sweep pages out from |
| pcplists only when the victim page is 4kB LRU page or thp head page. |
| But we should do this for a thp tail page too. |
| |
| Consider that a memory error hits a thp tail page whose head page is on |
| a pcplist when memory_failure() runs. Then, the current kernel skips |
| shake_pages() part, so hwpoison_user_mappings() returns without calling |
| split_huge_page() nor try_to_unmap() because PageLRU of the thp head is |
| still cleared due to the skip of shake_page(). |
| |
| As a result, me_huge_page() runs for the thp, which is broken behavior. |
| |
| One effect is a leak of the thp. And another is to fail to isolate the |
| memory error, so later access to the error address causes another MCE, |
| which kills the processes which used the thp. |
| |
| This patch fixes this problem by calling shake_page() for thp tail case. |
| |
| Fixes: 385de35722c9 ("thp: allow a hwpoisoned head page to be put back to LRU") |
| Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> |
| Reviewed-by: Andi Kleen <ak@linux.intel.com> |
| Acked-by: Dean Nelson <dnelson@redhat.com> |
| Cc: Andrea Arcangeli <aarcange@redhat.com> |
| Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> |
| Cc: Jin Dongming <jin.dongming@np.css.fujitsu.com> |
| Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
| Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| |
| --- |
| mm/memory-failure.c | 8 ++++---- |
| 1 file changed, 4 insertions(+), 4 deletions(-) |
| |
| --- a/mm/memory-failure.c |
| +++ b/mm/memory-failure.c |
| @@ -1141,10 +1141,10 @@ int memory_failure(unsigned long pfn, in |
| * The check (unnecessarily) ignores LRU pages being isolated and |
| * walked by the page reclaim code, however that's not a big loss. |
| */ |
| - if (!PageHuge(p) && !PageTransTail(p)) { |
| - if (!PageLRU(p)) |
| - shake_page(p, 0); |
| - if (!PageLRU(p)) { |
| + if (!PageHuge(p)) { |
| + if (!PageLRU(hpage)) |
| + shake_page(hpage, 0); |
| + if (!PageLRU(hpage)) { |
| /* |
| * shake_page could have turned it free. |
| */ |