| From a6f23d14ec7d7d02220ad8bb2774be3322b9aeec Mon Sep 17 00:00:00 2001 |
| From: =?UTF-8?q?Michal=20Koutn=C3=BD?= <mkoutny@suse.com> |
| Date: Thu, 6 Aug 2020 23:22:18 -0700 |
| Subject: [PATCH] mm/page_counter.c: fix protection usage propagation |
| MIME-Version: 1.0 |
| Content-Type: text/plain; charset=UTF-8 |
| Content-Transfer-Encoding: 8bit |
| |
| commit a6f23d14ec7d7d02220ad8bb2774be3322b9aeec upstream. |
| |
| When workload runs in cgroups that aren't directly below root cgroup and |
| their parent specifies reclaim protection, it may end up ineffective. |
| |
| The reason is that propagate_protected_usage() is not called in all |
| hierarchy up. All the protected usage is incorrectly accumulated in the |
| workload's parent. This means that siblings_low_usage is overestimated |
| and effective protection underestimated. Even though it is transitional |
| phenomenon (uncharge path does correct propagation and fixes the wrong |
| children_low_usage), it can undermine the intended protection |
| unexpectedly. |
| |
| We have noticed this problem while seeing a swap out in a descendant of a |
| protected memcg (intermediate node) while the parent was conveniently |
| under its protection limit and the memory pressure was external to that |
| hierarchy. Michal has pinpointed this down to the wrong |
| siblings_low_usage which led to the unwanted reclaim. |
| |
| The fix is simply updating children_low_usage in respective ancestors also |
| in the charging path. |
| |
| Fixes: 230671533d64 ("mm: memory.low hierarchical behavior") |
| Signed-off-by: Michal Koutný <mkoutny@suse.com> |
| Signed-off-by: Michal Hocko <mhocko@suse.com> |
| Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
| Acked-by: Michal Hocko <mhocko@suse.com> |
| Acked-by: Roman Gushchin <guro@fb.com> |
| Cc: Johannes Weiner <hannes@cmpxchg.org> |
| Cc: Tejun Heo <tj@kernel.org> |
| Cc: <stable@vger.kernel.org> [4.18+] |
| Link: http://lkml.kernel.org/r/20200803153231.15477-1-mhocko@kernel.org |
| Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
| |
| diff --git a/mm/page_counter.c b/mm/page_counter.c |
| index c56db2d5e159..b4663844c9b3 100644 |
| --- a/mm/page_counter.c |
| +++ b/mm/page_counter.c |
| @@ -72,7 +72,7 @@ void page_counter_charge(struct page_counter *counter, unsigned long nr_pages) |
| long new; |
| |
| new = atomic_long_add_return(nr_pages, &c->usage); |
| - propagate_protected_usage(counter, new); |
| + propagate_protected_usage(c, new); |
| /* |
| * This is indeed racy, but we can live with some |
| * inaccuracy in the watermark. |
| @@ -116,7 +116,7 @@ bool page_counter_try_charge(struct page_counter *counter, |
| new = atomic_long_add_return(nr_pages, &c->usage); |
| if (new > c->max) { |
| atomic_long_sub(nr_pages, &c->usage); |
| - propagate_protected_usage(counter, new); |
| + propagate_protected_usage(c, new); |
| /* |
| * This is racy, but we can live with some |
| * inaccuracy in the failcnt. |
| @@ -125,7 +125,7 @@ bool page_counter_try_charge(struct page_counter *counter, |
| *fail = c; |
| goto failed; |
| } |
| - propagate_protected_usage(counter, new); |
| + propagate_protected_usage(c, new); |
| /* |
| * Just like with failcnt, we can live with some |
| * inaccuracy in the watermark. |
| -- |
| 2.27.0 |
| |