| From: Shakeel Butt <shakeelb@google.com> |
| Subject: Revert "memcg: cleanup racy sum avoidance code" |
| Date: Wed, 17 Aug 2022 17:21:39 +0000 |
| |
| This reverts commit 96e51ccf1af33e82f429a0d6baebba29c6448d0f. |
| |
| Recently we started running the kernel with rstat infrastructure on |
| production traffic and begin to see negative memcg stats values. |
| Particularly the 'sock' stat is the one which we observed having negative |
| value. |
| |
| $ grep "sock " /mnt/memory/job/memory.stat |
| sock 253952 |
| total_sock 18446744073708724224 |
| |
| Re-run after couple of seconds |
| |
| $ grep "sock " /mnt/memory/job/memory.stat |
| sock 253952 |
| total_sock 53248 |
| |
| For now we are only seeing this issue on large machines (256 CPUs) and |
| only with 'sock' stat. I think the networking stack increase the stat on |
| one cpu and decrease it on another cpu much more often. So, this negative |
| sock is due to rstat flusher flushing the stats on the CPU that has seen |
| the decrement of sock but missed the CPU that has increments. A typical |
| race condition. |
| |
| For easy stable backport, revert is the most simple solution. For long |
| term solution, I am thinking of two directions. First is just reduce the |
| race window by optimizing the rstat flusher. Second is if the reader sees |
| a negative stat value, force flush and restart the stat collection. |
| Basically retry but limited. |
| |
| Link: https://lkml.kernel.org/r/20220817172139.3141101-1-shakeelb@google.com |
| Fixes: 96e51ccf1af33e8 ("memcg: cleanup racy sum avoidance code") |
| Signed-off-by: Shakeel Butt <shakeelb@google.com> |
| Cc: "Michal Koutný" <mkoutny@suse.com> |
| Cc: Johannes Weiner <hannes@cmpxchg.org> |
| Cc: Michal Hocko <mhocko@kernel.org> |
| Cc: Roman Gushchin <roman.gushchin@linux.dev> |
| Cc: Muchun Song <songmuchun@bytedance.com> |
| Cc: David Hildenbrand <david@redhat.com> |
| Cc: Yosry Ahmed <yosryahmed@google.com> |
| Cc: Greg Thelen <gthelen@google.com> |
| Cc: <stable@vger.kernel.org> [5.15] |
| Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
| --- |
| |
| include/linux/memcontrol.h | 15 +++++++++++++-- |
| 1 file changed, 13 insertions(+), 2 deletions(-) |
| |
| --- a/include/linux/memcontrol.h~revert-memcg-cleanup-racy-sum-avoidance-code |
| +++ a/include/linux/memcontrol.h |
| @@ -987,19 +987,30 @@ static inline void mod_memcg_page_state( |
| |
| static inline unsigned long memcg_page_state(struct mem_cgroup *memcg, int idx) |
| { |
| - return READ_ONCE(memcg->vmstats.state[idx]); |
| + long x = READ_ONCE(memcg->vmstats.state[idx]); |
| +#ifdef CONFIG_SMP |
| + if (x < 0) |
| + x = 0; |
| +#endif |
| + return x; |
| } |
| |
| static inline unsigned long lruvec_page_state(struct lruvec *lruvec, |
| enum node_stat_item idx) |
| { |
| struct mem_cgroup_per_node *pn; |
| + long x; |
| |
| if (mem_cgroup_disabled()) |
| return node_page_state(lruvec_pgdat(lruvec), idx); |
| |
| pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec); |
| - return READ_ONCE(pn->lruvec_stats.state[idx]); |
| + x = READ_ONCE(pn->lruvec_stats.state[idx]); |
| +#ifdef CONFIG_SMP |
| + if (x < 0) |
| + x = 0; |
| +#endif |
| + return x; |
| } |
| |
| static inline unsigned long lruvec_page_state_local(struct lruvec *lruvec, |
| _ |