| From bippy-5f407fcff5a0 Mon Sep 17 00:00:00 2001 |
| From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| To: <linux-cve-announce@vger.kernel.org> |
| Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org> |
| Subject: CVE-2024-43892: memcg: protect concurrent access to mem_cgroup_idr |
| |
| Description |
| =========== |
| |
| In the Linux kernel, the following vulnerability has been resolved: |
| |
| memcg: protect concurrent access to mem_cgroup_idr |
| |
| Commit 73f576c04b94 ("mm: memcontrol: fix cgroup creation failure after |
| many small jobs") decoupled the memcg IDs from the CSS ID space to fix the |
| cgroup creation failures. It introduced IDR to maintain the memcg ID |
| space. The IDR depends on external synchronization mechanisms for |
| modifications. For the mem_cgroup_idr, the idr_alloc() and idr_replace() |
| happen within css callback and thus are protected through cgroup_mutex |
| from concurrent modifications. However idr_remove() for mem_cgroup_idr |
| was not protected against concurrency and can be run concurrently for |
| different memcgs when they hit their refcnt to zero. Fix that. |
| |
| We have been seeing list_lru based kernel crashes at a low frequency in |
| our fleet for a long time. These crashes were in different part of |
| list_lru code including list_lru_add(), list_lru_del() and reparenting |
| code. Upon further inspection, it looked like for a given object (dentry |
| and inode), the super_block's list_lru didn't have list_lru_one for the |
| memcg of that object. The initial suspicions were either the object is |
| not allocated through kmem_cache_alloc_lru() or somehow |
| memcg_list_lru_alloc() failed to allocate list_lru_one() for a memcg but |
| returned success. No evidence were found for these cases. |
| |
| Looking more deeply, we started seeing situations where valid memcg's id |
| is not present in mem_cgroup_idr and in some cases multiple valid memcgs |
| have same id and mem_cgroup_idr is pointing to one of them. So, the most |
| reasonable explanation is that these situations can happen due to race |
| between multiple idr_remove() calls or race between |
| idr_alloc()/idr_replace() and idr_remove(). These races are causing |
| multiple memcgs to acquire the same ID and then offlining of one of them |
| would cleanup list_lrus on the system for all of them. Later access from |
| other memcgs to the list_lru cause crashes due to missing list_lru_one. |
| |
| The Linux kernel CVE team has assigned CVE-2024-43892 to this issue. |
| |
| |
| Affected and fixed versions |
| =========================== |
| |
| Issue introduced in 4.7 with commit 73f576c04b9410ed19660f74f97521bee6e1c546 and fixed in 5.10.226 with commit 912736a0435ef40e6a4ae78197ccb5553cb80b05 |
| Issue introduced in 4.7 with commit 73f576c04b9410ed19660f74f97521bee6e1c546 and fixed in 5.15.167 with commit e6cc9ff2ac0b5df9f25eb790934c3104f6710278 |
| Issue introduced in 4.7 with commit 73f576c04b9410ed19660f74f97521bee6e1c546 and fixed in 6.1.110 with commit 56fd70f4aa8b82199dbe7e99366b1fd7a04d86fb |
| Issue introduced in 4.7 with commit 73f576c04b9410ed19660f74f97521bee6e1c546 and fixed in 6.6.46 with commit 37a060b64ae83b76600d187d76591ce488ab836b |
| Issue introduced in 4.7 with commit 73f576c04b9410ed19660f74f97521bee6e1c546 and fixed in 6.10.5 with commit 51c0b1bb7541f8893ec1accba59eb04361a70946 |
| Issue introduced in 4.7 with commit 73f576c04b9410ed19660f74f97521bee6e1c546 and fixed in 6.11 with commit 9972605a238339b85bd16b084eed5f18414d22db |
| Issue introduced in 4.4.18 with commit 8627c7750a66a46d56d3564e1e881aa53764497c |
| Issue introduced in 4.6.6 with commit db70cd18d3da727a3a59694de428a9e41c620de7 |
| |
| Please see https://www.kernel.org for a full list of currently supported |
| kernel versions by the kernel community. |
| |
| Unaffected versions might change over time as fixes are backported to |
| older supported kernel versions. The official CVE entry at |
| https://cve.org/CVERecord/?id=CVE-2024-43892 |
| will be updated if fixes are backported, please check that for the most |
| up to date information about this issue. |
| |
| |
| Affected files |
| ============== |
| |
| The file(s) affected by this issue are: |
| mm/memcontrol.c |
| |
| |
| Mitigation |
| ========== |
| |
| The Linux kernel CVE team recommends that you update to the latest |
| stable kernel version for this, and many other bugfixes. Individual |
| changes are never tested alone, but rather are part of a larger kernel |
| release. Cherry-picking individual commits is not recommended or |
| supported by the Linux kernel community at all. If however, updating to |
| the latest release is impossible, the individual changes to resolve this |
| issue can be found at these commits: |
| https://git.kernel.org/stable/c/912736a0435ef40e6a4ae78197ccb5553cb80b05 |
| https://git.kernel.org/stable/c/e6cc9ff2ac0b5df9f25eb790934c3104f6710278 |
| https://git.kernel.org/stable/c/56fd70f4aa8b82199dbe7e99366b1fd7a04d86fb |
| https://git.kernel.org/stable/c/37a060b64ae83b76600d187d76591ce488ab836b |
| https://git.kernel.org/stable/c/51c0b1bb7541f8893ec1accba59eb04361a70946 |
| https://git.kernel.org/stable/c/9972605a238339b85bd16b084eed5f18414d22db |