| From bippy-5f407fcff5a0 Mon Sep 17 00:00:00 2001 |
| From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| To: <linux-cve-announce@vger.kernel.org> |
| Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org> |
| Subject: CVE-2024-26658: bcachefs: grab s_umount only if snapshotting |
| |
| Description |
| =========== |
| |
| In the Linux kernel, the following vulnerability has been resolved: |
| |
| bcachefs: grab s_umount only if snapshotting |
| |
| When I was testing mongodb over bcachefs with compression, |
| there is a lockdep warning when snapshotting mongodb data volume. |
| |
| $ cat test.sh |
| prog=bcachefs |
| |
| $prog subvolume create /mnt/data |
| $prog subvolume create /mnt/data/snapshots |
| |
| while true;do |
| $prog subvolume snapshot /mnt/data /mnt/data/snapshots/$(date +%s) |
| sleep 1s |
| done |
| |
| $ cat /etc/mongodb.conf |
| systemLog: |
| destination: file |
| logAppend: true |
| path: /mnt/data/mongod.log |
| |
| storage: |
| dbPath: /mnt/data/ |
| |
| lockdep reports: |
| [ 3437.452330] ====================================================== |
| [ 3437.452750] WARNING: possible circular locking dependency detected |
| [ 3437.453168] 6.7.0-rc7-custom+ #85 Tainted: G E |
| [ 3437.453562] ------------------------------------------------------ |
| [ 3437.453981] bcachefs/35533 is trying to acquire lock: |
| [ 3437.454325] ffffa0a02b2b1418 (sb_writers#10){.+.+}-{0:0}, at: filename_create+0x62/0x190 |
| [ 3437.454875] |
| but task is already holding lock: |
| [ 3437.455268] ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] |
| [ 3437.456009] |
| which lock already depends on the new lock. |
| |
| [ 3437.456553] |
| the existing dependency chain (in reverse order) is: |
| [ 3437.457054] |
| -> #3 (&type->s_umount_key#48){.+.+}-{3:3}: |
| [ 3437.457507] down_read+0x3e/0x170 |
| [ 3437.457772] bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] |
| [ 3437.458206] __x64_sys_ioctl+0x93/0xd0 |
| [ 3437.458498] do_syscall_64+0x42/0xf0 |
| [ 3437.458779] entry_SYSCALL_64_after_hwframe+0x6e/0x76 |
| [ 3437.459155] |
| -> #2 (&c->snapshot_create_lock){++++}-{3:3}: |
| [ 3437.459615] down_read+0x3e/0x170 |
| [ 3437.459878] bch2_truncate+0x82/0x110 [bcachefs] |
| [ 3437.460276] bchfs_truncate+0x254/0x3c0 [bcachefs] |
| [ 3437.460686] notify_change+0x1f1/0x4a0 |
| [ 3437.461283] do_truncate+0x7f/0xd0 |
| [ 3437.461555] path_openat+0xa57/0xce0 |
| [ 3437.461836] do_filp_open+0xb4/0x160 |
| [ 3437.462116] do_sys_openat2+0x91/0xc0 |
| [ 3437.462402] __x64_sys_openat+0x53/0xa0 |
| [ 3437.462701] do_syscall_64+0x42/0xf0 |
| [ 3437.462982] entry_SYSCALL_64_after_hwframe+0x6e/0x76 |
| [ 3437.463359] |
| -> #1 (&sb->s_type->i_mutex_key#15){+.+.}-{3:3}: |
| [ 3437.463843] down_write+0x3b/0xc0 |
| [ 3437.464223] bch2_write_iter+0x5b/0xcc0 [bcachefs] |
| [ 3437.464493] vfs_write+0x21b/0x4c0 |
| [ 3437.464653] ksys_write+0x69/0xf0 |
| [ 3437.464839] do_syscall_64+0x42/0xf0 |
| [ 3437.465009] entry_SYSCALL_64_after_hwframe+0x6e/0x76 |
| [ 3437.465231] |
| -> #0 (sb_writers#10){.+.+}-{0:0}: |
| [ 3437.465471] __lock_acquire+0x1455/0x21b0 |
| [ 3437.465656] lock_acquire+0xc6/0x2b0 |
| [ 3437.465822] mnt_want_write+0x46/0x1a0 |
| [ 3437.465996] filename_create+0x62/0x190 |
| [ 3437.466175] user_path_create+0x2d/0x50 |
| [ 3437.466352] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs] |
| [ 3437.466617] __x64_sys_ioctl+0x93/0xd0 |
| [ 3437.466791] do_syscall_64+0x42/0xf0 |
| [ 3437.466957] entry_SYSCALL_64_after_hwframe+0x6e/0x76 |
| [ 3437.467180] |
| other info that might help us debug this: |
| |
| [ 3437.469670] 2 locks held by bcachefs/35533: |
| other info that might help us debug this: |
| |
| [ 3437.467507] Chain exists of: |
| sb_writers#10 --> &c->snapshot_create_lock --> &type->s_umount_key#48 |
| |
| [ 3437.467979] Possible unsafe locking scenario: |
| |
| [ 3437.468223] CPU0 CPU1 |
| [ 3437.468405] ---- ---- |
| [ 3437.468585] rlock(&type->s_umount_key#48); |
| [ 3437.468758] lock(&c->snapshot_create_lock); |
| [ 3437.469030] lock(&type->s_umount_key#48); |
| [ 3437.469291] rlock(sb_writers#10); |
| [ 3437.469434] |
| *** DEADLOCK *** |
| |
| [ 3437.469670] 2 locks held by bcachefs/35533: |
| [ 3437.469838] #0: ffffa0a02ce00a88 (&c->snapshot_create_lock){++++}-{3:3}, at: bch2_fs_file_ioctl+0x1e3/0xc90 [bcachefs] |
| [ 3437.470294] #1: ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] |
| [ 3437.470744] |
| stack backtrace: |
| [ 3437.470922] CPU: 7 PID: 35533 Comm: bcachefs Kdump: loaded Tainted: G E 6.7.0-rc7-custom+ #85 |
| [ 3437.471313] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 |
| [ 3437.471694] Call Trace: |
| [ 3437.471795] <TASK> |
| [ 3437.471884] dump_stack_lvl+0x57/0x90 |
| [ 3437.472035] check_noncircular+0x132/0x150 |
| [ 3437.472202] __lock_acquire+0x1455/0x21b0 |
| [ 3437.472369] lock_acquire+0xc6/0x2b0 |
| [ 3437.472518] ? filename_create+0x62/0x190 |
| [ 3437.472683] ? lock_is_held_type+0x97/0x110 |
| [ 3437.472856] mnt_want_write+0x46/0x1a0 |
| [ 3437.473025] ? filename_create+0x62/0x190 |
| [ 3437.473204] filename_create+0x62/0x190 |
| [ 3437.473380] user_path_create+0x2d/0x50 |
| [ 3437.473555] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs] |
| [ 3437.473819] ? lock_acquire+0xc6/0x2b0 |
| [ 3437.474002] ? __fget_files+0x2a/0x190 |
| [ 3437.474195] ? __fget_files+0xbc/0x190 |
| [ 3437.474380] ? lock_release+0xc5/0x270 |
| [ 3437.474567] ? __x64_sys_ioctl+0x93/0xd0 |
| [ 3437.474764] ? __pfx_bch2_fs_file_ioctl+0x10/0x10 [bcachefs] |
| [ 3437.475090] __x64_sys_ioctl+0x93/0xd0 |
| [ 3437.475277] do_syscall_64+0x42/0xf0 |
| [ 3437.475454] entry_SYSCALL_64_after_hwframe+0x6e/0x76 |
| [ 3437.475691] RIP: 0033:0x7f2743c313af |
| ====================================================== |
| |
| In __bch2_ioctl_subvolume_create(), we grab s_umount unconditionally |
| and unlock it at the end of the function. There is a comment |
| "why do we need this lock?" about the lock coming from |
| commit 42d237320e98 ("bcachefs: Snapshot creation, deletion") |
| The reason is that __bch2_ioctl_subvolume_create() calls |
| sync_inodes_sb() which enforce locked s_umount to writeback all dirty |
| nodes before doing snapshot works. |
| |
| Fix it by read locking s_umount for snapshotting only and unlocking |
| s_umount after sync_inodes_sb(). |
| |
| The Linux kernel CVE team has assigned CVE-2024-26658 to this issue. |
| |
| |
| Affected and fixed versions |
| =========================== |
| |
| Issue introduced in 6.7 with commit 1c6fdbd8f2465ddfb73a01ec620cbf3d14044e1a and fixed in 6.7.5 with commit 5b41d3fd04c6757b9c2a60a0c5b2609cae9999df |
| Issue introduced in 6.7 with commit 1c6fdbd8f2465ddfb73a01ec620cbf3d14044e1a and fixed in 6.8 with commit 2acc59dd88d27ad69b66ded80df16c042b04eeec |
| |
| Please see https://www.kernel.org for a full list of currently supported |
| kernel versions by the kernel community. |
| |
| Unaffected versions might change over time as fixes are backported to |
| older supported kernel versions. The official CVE entry at |
| https://cve.org/CVERecord/?id=CVE-2024-26658 |
| will be updated if fixes are backported, please check that for the most |
| up to date information about this issue. |
| |
| |
| Affected files |
| ============== |
| |
| The file(s) affected by this issue are: |
| fs/bcachefs/fs-ioctl.c |
| |
| |
| Mitigation |
| ========== |
| |
| The Linux kernel CVE team recommends that you update to the latest |
| stable kernel version for this, and many other bugfixes. Individual |
| changes are never tested alone, but rather are part of a larger kernel |
| release. Cherry-picking individual commits is not recommended or |
| supported by the Linux kernel community at all. If however, updating to |
| the latest release is impossible, the individual changes to resolve this |
| issue can be found at these commits: |
| https://git.kernel.org/stable/c/5b41d3fd04c6757b9c2a60a0c5b2609cae9999df |
| https://git.kernel.org/stable/c/2acc59dd88d27ad69b66ded80df16c042b04eeec |