| From bippy-5f407fcff5a0 Mon Sep 17 00:00:00 2001 |
| From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| To: <linux-cve-announce@vger.kernel.org> |
| Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org> |
| Subject: CVE-2024-45003: vfs: Don't evict inode under the inode lru traversing context |
| |
| Description |
| =========== |
| |
| In the Linux kernel, the following vulnerability has been resolved: |
| |
| vfs: Don't evict inode under the inode lru traversing context |
| |
| The inode reclaiming process(See function prune_icache_sb) collects all |
| reclaimable inodes and mark them with I_FREEING flag at first, at that |
| time, other processes will be stuck if they try getting these inodes |
| (See function find_inode_fast), then the reclaiming process destroy the |
| inodes by function dispose_list(). Some filesystems(eg. ext4 with |
| ea_inode feature, ubifs with xattr) may do inode lookup in the inode |
| evicting callback function, if the inode lookup is operated under the |
| inode lru traversing context, deadlock problems may happen. |
| |
| Case 1: In function ext4_evict_inode(), the ea inode lookup could happen |
| if ea_inode feature is enabled, the lookup process will be stuck |
| under the evicting context like this: |
| |
| 1. File A has inode i_reg and an ea inode i_ea |
| 2. getfattr(A, xattr_buf) // i_ea is added into lru // lru->i_ea |
| 3. Then, following three processes running like this: |
| |
| PA PB |
| echo 2 > /proc/sys/vm/drop_caches |
| shrink_slab |
| prune_dcache_sb |
| // i_reg is added into lru, lru->i_ea->i_reg |
| prune_icache_sb |
| list_lru_walk_one |
| inode_lru_isolate |
| i_ea->i_state |= I_FREEING // set inode state |
| inode_lru_isolate |
| __iget(i_reg) |
| spin_unlock(&i_reg->i_lock) |
| spin_unlock(lru_lock) |
| rm file A |
| i_reg->nlink = 0 |
| iput(i_reg) // i_reg->nlink is 0, do evict |
| ext4_evict_inode |
| ext4_xattr_delete_inode |
| ext4_xattr_inode_dec_ref_all |
| ext4_xattr_inode_iget |
| ext4_iget(i_ea->i_ino) |
| iget_locked |
| find_inode_fast |
| __wait_on_freeing_inode(i_ea) ----→ AA deadlock |
| dispose_list // cannot be executed by prune_icache_sb |
| wake_up_bit(&i_ea->i_state) |
| |
| Case 2: In deleted inode writing function ubifs_jnl_write_inode(), file |
| deleting process holds BASEHD's wbuf->io_mutex while getting the |
| xattr inode, which could race with inode reclaiming process(The |
| reclaiming process could try locking BASEHD's wbuf->io_mutex in |
| inode evicting function), then an ABBA deadlock problem would |
| happen as following: |
| |
| 1. File A has inode ia and a xattr(with inode ixa), regular file B has |
| inode ib and a xattr. |
| 2. getfattr(A, xattr_buf) // ixa is added into lru // lru->ixa |
| 3. Then, following three processes running like this: |
| |
| PA PB PC |
| echo 2 > /proc/sys/vm/drop_caches |
| shrink_slab |
| prune_dcache_sb |
| // ib and ia are added into lru, lru->ixa->ib->ia |
| prune_icache_sb |
| list_lru_walk_one |
| inode_lru_isolate |
| ixa->i_state |= I_FREEING // set inode state |
| inode_lru_isolate |
| __iget(ib) |
| spin_unlock(&ib->i_lock) |
| spin_unlock(lru_lock) |
| rm file B |
| ib->nlink = 0 |
| rm file A |
| iput(ia) |
| ubifs_evict_inode(ia) |
| ubifs_jnl_delete_inode(ia) |
| ubifs_jnl_write_inode(ia) |
| make_reservation(BASEHD) // Lock wbuf->io_mutex |
| ubifs_iget(ixa->i_ino) |
| iget_locked |
| find_inode_fast |
| __wait_on_freeing_inode(ixa) |
| | iput(ib) // ib->nlink is 0, do evict |
| | ubifs_evict_inode |
| | ubifs_jnl_delete_inode(ib) |
| ↓ ubifs_jnl_write_inode |
| ABBA deadlock ←-----make_reservation(BASEHD) |
| dispose_list // cannot be executed by prune_icache_sb |
| wake_up_bit(&ixa->i_state) |
| |
| Fix the possible deadlock by using new inode state flag I_LRU_ISOLATING |
| to pin the inode in memory while inode_lru_isolate() reclaims its pages |
| instead of using ordinary inode reference. This way inode deletion |
| cannot be triggered from inode_lru_isolate() thus avoiding the deadlock. |
| evict() is made to wait for I_LRU_ISOLATING to be cleared before |
| proceeding with inode cleanup. |
| |
| The Linux kernel CVE team has assigned CVE-2024-45003 to this issue. |
| |
| |
| Affected and fixed versions |
| =========================== |
| |
| Issue introduced in 4.13 with commit e50e5129f384ae282adebfb561189cdb19b81cee and fixed in 5.4.283 with commit 3525ad25240dfdd8c78f3470911ed10aa727aa72 |
| Issue introduced in 4.13 with commit e50e5129f384ae282adebfb561189cdb19b81cee and fixed in 5.10.225 with commit 03880af02a78bc9a98b5a581f529cf709c88a9b8 |
| Issue introduced in 4.13 with commit e50e5129f384ae282adebfb561189cdb19b81cee and fixed in 5.15.166 with commit cda54ec82c0f9d05393242b20b13f69b083f7e88 |
| Issue introduced in 4.13 with commit e50e5129f384ae282adebfb561189cdb19b81cee and fixed in 6.1.107 with commit 437741eba63bf4e437e2beb5583f8633556a2b98 |
| Issue introduced in 4.13 with commit e50e5129f384ae282adebfb561189cdb19b81cee and fixed in 6.6.48 with commit b9bda5f6012dd00372f3a06a82ed8971a4c57c32 |
| Issue introduced in 4.13 with commit e50e5129f384ae282adebfb561189cdb19b81cee and fixed in 6.10.7 with commit 9063ab49c11e9518a3f2352434bb276cc8134c5f |
| Issue introduced in 4.13 with commit e50e5129f384ae282adebfb561189cdb19b81cee and fixed in 6.11 with commit 2a0629834cd82f05d424bbc193374f9a43d1f87d |
| |
| Please see https://www.kernel.org for a full list of currently supported |
| kernel versions by the kernel community. |
| |
| Unaffected versions might change over time as fixes are backported to |
| older supported kernel versions. The official CVE entry at |
| https://cve.org/CVERecord/?id=CVE-2024-45003 |
| will be updated if fixes are backported, please check that for the most |
| up to date information about this issue. |
| |
| |
| Affected files |
| ============== |
| |
| The file(s) affected by this issue are: |
| fs/inode.c |
| include/linux/fs.h |
| |
| |
| Mitigation |
| ========== |
| |
| The Linux kernel CVE team recommends that you update to the latest |
| stable kernel version for this, and many other bugfixes. Individual |
| changes are never tested alone, but rather are part of a larger kernel |
| release. Cherry-picking individual commits is not recommended or |
| supported by the Linux kernel community at all. If however, updating to |
| the latest release is impossible, the individual changes to resolve this |
| issue can be found at these commits: |
| https://git.kernel.org/stable/c/3525ad25240dfdd8c78f3470911ed10aa727aa72 |
| https://git.kernel.org/stable/c/03880af02a78bc9a98b5a581f529cf709c88a9b8 |
| https://git.kernel.org/stable/c/cda54ec82c0f9d05393242b20b13f69b083f7e88 |
| https://git.kernel.org/stable/c/437741eba63bf4e437e2beb5583f8633556a2b98 |
| https://git.kernel.org/stable/c/b9bda5f6012dd00372f3a06a82ed8971a4c57c32 |
| https://git.kernel.org/stable/c/9063ab49c11e9518a3f2352434bb276cc8134c5f |
| https://git.kernel.org/stable/c/2a0629834cd82f05d424bbc193374f9a43d1f87d |