| From: Suren Baghdasaryan <surenb@google.com> |
| Subject: mm: drop oom code from exit_mmap |
| Date: Tue, 31 May 2022 15:30:59 -0700 |
| |
| The primary reason to invoke the oom reaper from the exit_mmap path used |
| to be a prevention of an excessive oom killing if the oom victim exit |
| races with the oom reaper (see [1] for more details). The invocation has |
| moved around since then because of the interaction with the munlock logic |
| but the underlying reason has remained the same (see [2]). |
| |
| Munlock code is no longer a problem since [3] and there shouldn't be any |
| blocking operation before the memory is unmapped by exit_mmap so the oom |
| reaper invocation can be dropped. The unmapping part can be done with the |
| non-exclusive mmap_sem and the exclusive one is only required when page |
| tables are freed. |
| |
| Remove the oom_reaper from exit_mmap which will make the code easier to |
| read. This is really unlikely to make any observable difference although |
| some microbenchmarks could benefit from one less branch that needs to be |
| evaluated even though it almost never is true. |
| |
| [1] 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run concurrently") |
| [2] 27ae357fa82b ("mm, oom: fix concurrent munlock and oom reaper unmap, v3") |
| [3] a213e5cf71cb ("mm/munlock: delete munlock_vma_pages_all(), allow oomreap") |
| |
| [akpm@linux-foundation.org: restore Suren's mmap_read_lock() optimization] |
| Link: https://lkml.kernel.org/r/20220531223100.510392-1-surenb@google.com |
| Signed-off-by: Suren Baghdasaryan <surenb@google.com> |
| Acked-by: Michal Hocko <mhocko@suse.com> |
| Cc: Andrea Arcangeli <aarcange@redhat.com> |
| Cc: Christian Brauner (Microsoft) <brauner@kernel.org> |
| Cc: Christoph Hellwig <hch@infradead.org> |
| Cc: David Hildenbrand <david@redhat.com> |
| Cc: David Rientjes <rientjes@google.com> |
| Cc: Jann Horn <jannh@google.com> |
| Cc: Johannes Weiner <hannes@cmpxchg.org> |
| Cc: John Hubbard <jhubbard@nvidia.com> |
| Cc: "Kirill A . Shutemov" <kirill@shutemov.name> |
| Cc: Liam Howlett <liam.howlett@oracle.com> |
| Cc: Matthew Wilcox <willy@infradead.org> |
| Cc: Minchan Kim <minchan@kernel.org> |
| Cc: Oleg Nesterov <oleg@redhat.com> |
| Cc: Peter Xu <peterx@redhat.com> |
| Cc: Roman Gushchin <guro@fb.com> |
| Cc: Shakeel Butt <shakeelb@google.com> |
| Cc: Shuah Khan <shuah@kernel.org> |
| Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
| --- |
| |
| include/linux/oom.h | 2 -- |
| mm/mmap.c | 30 +++++++++++------------------- |
| mm/oom_kill.c | 2 +- |
| 3 files changed, 12 insertions(+), 22 deletions(-) |
| |
| --- a/include/linux/oom.h~mm-drop-oom-code-from-exit_mmap |
| +++ a/include/linux/oom.h |
| @@ -106,8 +106,6 @@ static inline vm_fault_t check_stable_ad |
| return 0; |
| } |
| |
| -bool __oom_reap_task_mm(struct mm_struct *mm); |
| - |
| long oom_badness(struct task_struct *p, |
| unsigned long totalpages); |
| |
| --- a/mm/mmap.c~mm-drop-oom-code-from-exit_mmap |
| +++ a/mm/mmap.c |
| @@ -3085,30 +3085,13 @@ void exit_mmap(struct mm_struct *mm) |
| /* mm's last user has gone, and its about to be pulled down */ |
| mmu_notifier_release(mm); |
| |
| - if (unlikely(mm_is_oom_victim(mm))) { |
| - /* |
| - * Manually reap the mm to free as much memory as possible. |
| - * Then, as the oom reaper does, set MMF_OOM_SKIP to disregard |
| - * this mm from further consideration. Taking mm->mmap_lock for |
| - * write after setting MMF_OOM_SKIP will guarantee that the oom |
| - * reaper will not run on this mm again after mmap_lock is |
| - * dropped. |
| - * |
| - * Nothing can be holding mm->mmap_lock here and the above call |
| - * to mmu_notifier_release(mm) ensures mmu notifier callbacks in |
| - * __oom_reap_task_mm() will not block. |
| - */ |
| - (void)__oom_reap_task_mm(mm); |
| - set_bit(MMF_OOM_SKIP, &mm->flags); |
| - } |
| - |
| - mmap_write_lock(mm); |
| + mmap_read_lock(mm); |
| arch_exit_mmap(mm); |
| |
| vma = mas_find(&mas, ULONG_MAX); |
| if (!vma) { |
| /* Can happen if dup_mmap() received an OOM */ |
| - mmap_write_unlock(mm); |
| + mmap_read_unlock(mm); |
| return; |
| } |
| |
| @@ -3118,6 +3101,15 @@ void exit_mmap(struct mm_struct *mm) |
| /* update_hiwater_rss(mm) here? but nobody should be looking */ |
| /* Use ULONG_MAX here to ensure all VMAs in the mm are unmapped */ |
| unmap_vmas(&tlb, &mm->mm_mt, vma, 0, ULONG_MAX); |
| + mmap_read_unlock(mm); |
| + |
| + /* |
| + * Set MMF_OOM_SKIP to hide this task from the oom killer/reaper |
| + * because the memory has been already freed. Do not bother checking |
| + * mm_is_oom_victim because setting a bit unconditionally is cheaper. |
| + */ |
| + set_bit(MMF_OOM_SKIP, &mm->flags); |
| + mmap_write_lock(mm); |
| free_pgtables(&tlb, &mm->mm_mt, vma, FIRST_USER_ADDRESS, |
| USER_PGTABLES_CEILING); |
| tlb_finish_mmu(&tlb); |
| --- a/mm/oom_kill.c~mm-drop-oom-code-from-exit_mmap |
| +++ a/mm/oom_kill.c |
| @@ -509,7 +509,7 @@ static DECLARE_WAIT_QUEUE_HEAD(oom_reape |
| static struct task_struct *oom_reaper_list; |
| static DEFINE_SPINLOCK(oom_reaper_lock); |
| |
| -bool __oom_reap_task_mm(struct mm_struct *mm) |
| +static bool __oom_reap_task_mm(struct mm_struct *mm) |
| { |
| struct vm_area_struct *vma; |
| bool ret = true; |
| _ |