blob: 6415d7a7747b08d34234cd6a19adce617c564df0 [file] [log] [blame]
From bippy-5f407fcff5a0 Mon Sep 17 00:00:00 2001
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: <linux-cve-announce@vger.kernel.org>
Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org>
Subject: CVE-2024-43892: memcg: protect concurrent access to mem_cgroup_idr
Description
===========
In the Linux kernel, the following vulnerability has been resolved:
memcg: protect concurrent access to mem_cgroup_idr
Commit 73f576c04b94 ("mm: memcontrol: fix cgroup creation failure after
many small jobs") decoupled the memcg IDs from the CSS ID space to fix the
cgroup creation failures. It introduced IDR to maintain the memcg ID
space. The IDR depends on external synchronization mechanisms for
modifications. For the mem_cgroup_idr, the idr_alloc() and idr_replace()
happen within css callback and thus are protected through cgroup_mutex
from concurrent modifications. However idr_remove() for mem_cgroup_idr
was not protected against concurrency and can be run concurrently for
different memcgs when they hit their refcnt to zero. Fix that.
We have been seeing list_lru based kernel crashes at a low frequency in
our fleet for a long time. These crashes were in different part of
list_lru code including list_lru_add(), list_lru_del() and reparenting
code. Upon further inspection, it looked like for a given object (dentry
and inode), the super_block's list_lru didn't have list_lru_one for the
memcg of that object. The initial suspicions were either the object is
not allocated through kmem_cache_alloc_lru() or somehow
memcg_list_lru_alloc() failed to allocate list_lru_one() for a memcg but
returned success. No evidence were found for these cases.
Looking more deeply, we started seeing situations where valid memcg's id
is not present in mem_cgroup_idr and in some cases multiple valid memcgs
have same id and mem_cgroup_idr is pointing to one of them. So, the most
reasonable explanation is that these situations can happen due to race
between multiple idr_remove() calls or race between
idr_alloc()/idr_replace() and idr_remove(). These races are causing
multiple memcgs to acquire the same ID and then offlining of one of them
would cleanup list_lrus on the system for all of them. Later access from
other memcgs to the list_lru cause crashes due to missing list_lru_one.
The Linux kernel CVE team has assigned CVE-2024-43892 to this issue.
Affected and fixed versions
===========================
Issue introduced in 4.7 with commit 73f576c04b9410ed19660f74f97521bee6e1c546 and fixed in 5.10.226 with commit 912736a0435ef40e6a4ae78197ccb5553cb80b05
Issue introduced in 4.7 with commit 73f576c04b9410ed19660f74f97521bee6e1c546 and fixed in 5.15.167 with commit e6cc9ff2ac0b5df9f25eb790934c3104f6710278
Issue introduced in 4.7 with commit 73f576c04b9410ed19660f74f97521bee6e1c546 and fixed in 6.1.110 with commit 56fd70f4aa8b82199dbe7e99366b1fd7a04d86fb
Issue introduced in 4.7 with commit 73f576c04b9410ed19660f74f97521bee6e1c546 and fixed in 6.6.46 with commit 37a060b64ae83b76600d187d76591ce488ab836b
Issue introduced in 4.7 with commit 73f576c04b9410ed19660f74f97521bee6e1c546 and fixed in 6.10.5 with commit 51c0b1bb7541f8893ec1accba59eb04361a70946
Issue introduced in 4.7 with commit 73f576c04b9410ed19660f74f97521bee6e1c546 and fixed in 6.11 with commit 9972605a238339b85bd16b084eed5f18414d22db
Issue introduced in 4.4.18 with commit 8627c7750a66a46d56d3564e1e881aa53764497c
Issue introduced in 4.6.6 with commit db70cd18d3da727a3a59694de428a9e41c620de7
Please see https://www.kernel.org for a full list of currently supported
kernel versions by the kernel community.
Unaffected versions might change over time as fixes are backported to
older supported kernel versions. The official CVE entry at
https://cve.org/CVERecord/?id=CVE-2024-43892
will be updated if fixes are backported, please check that for the most
up to date information about this issue.
Affected files
==============
The file(s) affected by this issue are:
mm/memcontrol.c
Mitigation
==========
The Linux kernel CVE team recommends that you update to the latest
stable kernel version for this, and many other bugfixes. Individual
changes are never tested alone, but rather are part of a larger kernel
release. Cherry-picking individual commits is not recommended or
supported by the Linux kernel community at all. If however, updating to
the latest release is impossible, the individual changes to resolve this
issue can be found at these commits:
https://git.kernel.org/stable/c/912736a0435ef40e6a4ae78197ccb5553cb80b05
https://git.kernel.org/stable/c/e6cc9ff2ac0b5df9f25eb790934c3104f6710278
https://git.kernel.org/stable/c/56fd70f4aa8b82199dbe7e99366b1fd7a04d86fb
https://git.kernel.org/stable/c/37a060b64ae83b76600d187d76591ce488ab836b
https://git.kernel.org/stable/c/51c0b1bb7541f8893ec1accba59eb04361a70946
https://git.kernel.org/stable/c/9972605a238339b85bd16b084eed5f18414d22db