cve/published/2024/CVE-2024-42266.mbox - pub/scm/linux/security/vulns - Git at Google

 From bippy-5f407fcff5a0 Mon Sep 17 00:00:00 2001
 From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 To: <linux-cve-announce@vger.kernel.org>
 Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org>
 Subject: CVE-2024-42266: btrfs: make cow_file_range_inline() honor locked_page on error

 Description
 ===========

 In the Linux kernel, the following vulnerability has been resolved:

 btrfs: make cow_file_range_inline() honor locked_page on error

 The btrfs buffered write path runs through __extent_writepage() which
 has some tricky return value handling for writepage_delalloc().
 Specifically, when that returns 1, we exit, but for other return values
 we continue and end up calling btrfs_folio_end_all_writers(). If the
 folio has been unlocked (note that we check the PageLocked bit at the
 start of __extent_writepage()), this results in an assert panic like
 this one from syzbot:

   BTRFS: error (device loop0 state EAL) in free_log_tree:3267: errno=-5 IO failure
   BTRFS warning (device loop0 state EAL): Skipping commit of aborted transaction.
   BTRFS: error (device loop0 state EAL) in cleanup_transaction:2018: errno=-5 IO failure
   assertion failed: folio_test_locked(folio), in fs/btrfs/subpage.c:871
   ------------[ cut here ]------------
   kernel BUG at fs/btrfs/subpage.c:871!
   Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
   CPU: 1 PID: 5090 Comm: syz-executor225 Not tainted
   6.10.0-syzkaller-05505-gb1bc554e009e #0
   Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
   Google 06/27/2024
   RIP: 0010:btrfs_folio_end_all_writers+0x55b/0x610 fs/btrfs/subpage.c:871
   Code: e9 d3 fb ff ff e8 25 22 c2 fd 48 c7 c7 c0 3c 0e 8c 48 c7 c6 80 3d
   0e 8c 48 c7 c2 60 3c 0e 8c b9 67 03 00 00 e8 66 47 ad 07 90 <0f> 0b e8
   6e 45 b0 07 4c 89 ff be 08 00 00 00 e8 21 12 25 fe 4c 89
   RSP: 0018:ffffc900033d72e0 EFLAGS: 00010246
   RAX: 0000000000000045 RBX: 00fff0000000402c RCX: 663b7a08c50a0a00
   RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000
   RBP: ffffc900033d73b0 R08: ffffffff8176b98c R09: 1ffff9200067adfc
   R10: dffffc0000000000 R11: fffff5200067adfd R12: 0000000000000001
   R13: dffffc0000000000 R14: 0000000000000000 R15: ffffea0001cbee80
   FS:  0000000000000000(0000) GS:ffff8880b9500000(0000)
   knlGS:0000000000000000
   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
   CR2: 00007f5f076012f8 CR3: 000000000e134000 CR4: 00000000003506f0
   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
   Call Trace:
   <TASK>
   __extent_writepage fs/btrfs/extent_io.c:1597 [inline]
   extent_write_cache_pages fs/btrfs/extent_io.c:2251 [inline]
   btrfs_writepages+0x14d7/0x2760 fs/btrfs/extent_io.c:2373
   do_writepages+0x359/0x870 mm/page-writeback.c:2656
   filemap_fdatawrite_wbc+0x125/0x180 mm/filemap.c:397
   __filemap_fdatawrite_range mm/filemap.c:430 [inline]
   __filemap_fdatawrite mm/filemap.c:436 [inline]
   filemap_flush+0xdf/0x130 mm/filemap.c:463
   btrfs_release_file+0x117/0x130 fs/btrfs/file.c:1547
   __fput+0x24a/0x8a0 fs/file_table.c:422
   task_work_run+0x24f/0x310 kernel/task_work.c:222
   exit_task_work include/linux/task_work.h:40 [inline]
   do_exit+0xa2f/0x27f0 kernel/exit.c:877
   do_group_exit+0x207/0x2c0 kernel/exit.c:1026
   __do_sys_exit_group kernel/exit.c:1037 [inline]
   __se_sys_exit_group kernel/exit.c:1035 [inline]
   __x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1035
   x64_sys_call+0x2634/0x2640
   arch/x86/include/generated/asm/syscalls_64.h:232
   do_syscall_x64 arch/x86/entry/common.c:52 [inline]
   do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
   entry_SYSCALL_64_after_hwframe+0x77/0x7f
   RIP: 0033:0x7f5f075b70c9
   Code: Unable to access opcode bytes at
   0x7f5f075b709f.

 I was hitting the same issue by doing hundreds of accelerated runs of
 generic/475, which also hits IO errors by design.

 I instrumented that reproducer with bpftrace and found that the
 undesirable folio_unlock was coming from the following callstack:

   folio_unlock+5
   __process_pages_contig+475
   cow_file_range_inline.constprop.0+230
   cow_file_range+803
   btrfs_run_delalloc_range+566
   writepage_delalloc+332
   __extent_writepage # inlined in my stacktrace, but I added it here
   extent_write_cache_pages+622

 Looking at the bisected-to patch in the syzbot report, Josef realized
 that the logic of the cow_file_range_inline error path subtly changing.
 In the past, on error, it jumped to out_unlock in cow_file_range(),
 which honors the locked_page, so when we ultimately call
 folio_end_all_writers(), the folio of interest is still locked. After
 the change, we always unlocked ignoring the locked_page, on both success
 and error. On the success path, this all results in returning 1 to
 __extent_writepage(), which skips the folio_end_all_writers() call,
 which makes it OK to have unlocked.

 Fix the bug by wiring the locked_page into cow_file_range_inline() and
 only setting locked_page to NULL on success.

 The Linux kernel CVE team has assigned CVE-2024-42266 to this issue.


 Affected and fixed versions
 ===========================

 	Issue introduced in 6.10 with commit 0586d0a89e77d717da14df42648ace4a9fd67981 and fixed in 6.10.4 with commit 061e41581606000a83ce0f0f01d6ad338f3704e9
 	Issue introduced in 6.10 with commit 0586d0a89e77d717da14df42648ace4a9fd67981 and fixed in 6.11 with commit 478574370bef7951fbd9ef5155537d6cbed49472

 Please see https://www.kernel.org for a full list of currently supported
 kernel versions by the kernel community.

 Unaffected versions might change over time as fixes are backported to
 older supported kernel versions.  The official CVE entry at
 	https://cve.org/CVERecord/?id=CVE-2024-42266
 will be updated if fixes are backported, please check that for the most
 up to date information about this issue.


 Affected files
 ==============

 The file(s) affected by this issue are:
 	fs/btrfs/inode.c


 Mitigation
 ==========

 The Linux kernel CVE team recommends that you update to the latest
 stable kernel version for this, and many other bugfixes.  Individual
 changes are never tested alone, but rather are part of a larger kernel
 release.  Cherry-picking individual commits is not recommended or
 supported by the Linux kernel community at all.  If however, updating to
 the latest release is impossible, the individual changes to resolve this
 issue can be found at these commits:
 	https://git.kernel.org/stable/c/061e41581606000a83ce0f0f01d6ad338f3704e9
 	https://git.kernel.org/stable/c/478574370bef7951fbd9ef5155537d6cbed49472
	From bippy-5f407fcff5a0 Mon Sep 17 00:00:00 2001
	From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
	To: <linux-cve-announce@vger.kernel.org>
	Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org>
	Subject: CVE-2024-42266: btrfs: make cow_file_range_inline() honor locked_page on error

	Description
	===========

	In the Linux kernel, the following vulnerability has been resolved:

	btrfs: make cow_file_range_inline() honor locked_page on error

	The btrfs buffered write path runs through __extent_writepage() which
	has some tricky return value handling for writepage_delalloc().
	Specifically, when that returns 1, we exit, but for other return values
	we continue and end up calling btrfs_folio_end_all_writers(). If the
	folio has been unlocked (note that we check the PageLocked bit at the
	start of __extent_writepage()), this results in an assert panic like
	this one from syzbot:

	BTRFS: error (device loop0 state EAL) in free_log_tree:3267: errno=-5 IO failure
	BTRFS warning (device loop0 state EAL): Skipping commit of aborted transaction.
	BTRFS: error (device loop0 state EAL) in cleanup_transaction:2018: errno=-5 IO failure
	assertion failed: folio_test_locked(folio), in fs/btrfs/subpage.c:871
	------------[ cut here ]------------
	kernel BUG at fs/btrfs/subpage.c:871!
	Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
	CPU: 1 PID: 5090 Comm: syz-executor225 Not tainted
	6.10.0-syzkaller-05505-gb1bc554e009e #0
	Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
	Google 06/27/2024
	RIP: 0010:btrfs_folio_end_all_writers+0x55b/0x610 fs/btrfs/subpage.c:871
	Code: e9 d3 fb ff ff e8 25 22 c2 fd 48 c7 c7 c0 3c 0e 8c 48 c7 c6 80 3d
	0e 8c 48 c7 c2 60 3c 0e 8c b9 67 03 00 00 e8 66 47 ad 07 90 <0f> 0b e8
	6e 45 b0 07 4c 89 ff be 08 00 00 00 e8 21 12 25 fe 4c 89
	RSP: 0018:ffffc900033d72e0 EFLAGS: 00010246
	RAX: 0000000000000045 RBX: 00fff0000000402c RCX: 663b7a08c50a0a00
	RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000
	RBP: ffffc900033d73b0 R08: ffffffff8176b98c R09: 1ffff9200067adfc
	R10: dffffc0000000000 R11: fffff5200067adfd R12: 0000000000000001
	R13: dffffc0000000000 R14: 0000000000000000 R15: ffffea0001cbee80
	FS: 0000000000000000(0000) GS:ffff8880b9500000(0000)
	knlGS:0000000000000000
	CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	CR2: 00007f5f076012f8 CR3: 000000000e134000 CR4: 00000000003506f0
	DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
	DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
	Call Trace:
	<TASK>
	__extent_writepage fs/btrfs/extent_io.c:1597 [inline]
	extent_write_cache_pages fs/btrfs/extent_io.c:2251 [inline]
	btrfs_writepages+0x14d7/0x2760 fs/btrfs/extent_io.c:2373
	do_writepages+0x359/0x870 mm/page-writeback.c:2656
	filemap_fdatawrite_wbc+0x125/0x180 mm/filemap.c:397
	__filemap_fdatawrite_range mm/filemap.c:430 [inline]
	__filemap_fdatawrite mm/filemap.c:436 [inline]
	filemap_flush+0xdf/0x130 mm/filemap.c:463
	btrfs_release_file+0x117/0x130 fs/btrfs/file.c:1547
	__fput+0x24a/0x8a0 fs/file_table.c:422
	task_work_run+0x24f/0x310 kernel/task_work.c:222
	exit_task_work include/linux/task_work.h:40 [inline]
	do_exit+0xa2f/0x27f0 kernel/exit.c:877
	do_group_exit+0x207/0x2c0 kernel/exit.c:1026
	__do_sys_exit_group kernel/exit.c:1037 [inline]
	__se_sys_exit_group kernel/exit.c:1035 [inline]
	__x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1035
	x64_sys_call+0x2634/0x2640
	arch/x86/include/generated/asm/syscalls_64.h:232
	do_syscall_x64 arch/x86/entry/common.c:52 [inline]
	do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
	entry_SYSCALL_64_after_hwframe+0x77/0x7f
	RIP: 0033:0x7f5f075b70c9
	Code: Unable to access opcode bytes at
	0x7f5f075b709f.

	I was hitting the same issue by doing hundreds of accelerated runs of
	generic/475, which also hits IO errors by design.

	I instrumented that reproducer with bpftrace and found that the
	undesirable folio_unlock was coming from the following callstack:

	folio_unlock+5
	__process_pages_contig+475
	cow_file_range_inline.constprop.0+230
	cow_file_range+803
	btrfs_run_delalloc_range+566
	writepage_delalloc+332
	__extent_writepage # inlined in my stacktrace, but I added it here
	extent_write_cache_pages+622

	Looking at the bisected-to patch in the syzbot report, Josef realized
	that the logic of the cow_file_range_inline error path subtly changing.
	In the past, on error, it jumped to out_unlock in cow_file_range(),
	which honors the locked_page, so when we ultimately call
	folio_end_all_writers(), the folio of interest is still locked. After
	the change, we always unlocked ignoring the locked_page, on both success
	and error. On the success path, this all results in returning 1 to
	__extent_writepage(), which skips the folio_end_all_writers() call,
	which makes it OK to have unlocked.

	Fix the bug by wiring the locked_page into cow_file_range_inline() and
	only setting locked_page to NULL on success.

	The Linux kernel CVE team has assigned CVE-2024-42266 to this issue.


	Affected and fixed versions
	===========================

	Issue introduced in 6.10 with commit 0586d0a89e77d717da14df42648ace4a9fd67981 and fixed in 6.10.4 with commit 061e41581606000a83ce0f0f01d6ad338f3704e9
	Issue introduced in 6.10 with commit 0586d0a89e77d717da14df42648ace4a9fd67981 and fixed in 6.11 with commit 478574370bef7951fbd9ef5155537d6cbed49472

	Please see https://www.kernel.org for a full list of currently supported
	kernel versions by the kernel community.

	Unaffected versions might change over time as fixes are backported to
	older supported kernel versions. The official CVE entry at
	https://cve.org/CVERecord/?id=CVE-2024-42266
	will be updated if fixes are backported, please check that for the most
	up to date information about this issue.


	Affected files
	==============

	The file(s) affected by this issue are:
	fs/btrfs/inode.c


	Mitigation
	==========

	The Linux kernel CVE team recommends that you update to the latest
	stable kernel version for this, and many other bugfixes. Individual
	changes are never tested alone, but rather are part of a larger kernel
	release. Cherry-picking individual commits is not recommended or
	supported by the Linux kernel community at all. If however, updating to
	the latest release is impossible, the individual changes to resolve this
	issue can be found at these commits:
	https://git.kernel.org/stable/c/061e41581606000a83ce0f0f01d6ad338f3704e9
	https://git.kernel.org/stable/c/478574370bef7951fbd9ef5155537d6cbed49472