cve/published/2024/CVE-2024-56559.mbox - pub/scm/linux/security/vulns - Git at Google

 From bippy-5f407fcff5a0 Mon Sep 17 00:00:00 2001
 From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 To: <linux-cve-announce@vger.kernel.org>
 Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org>
 Subject: CVE-2024-56559: mm/vmalloc: combine all TLB flush operations of KASAN shadow virtual address into one operation

 Description
 ===========

 In the Linux kernel, the following vulnerability has been resolved:

 mm/vmalloc: combine all TLB flush operations of KASAN shadow virtual address into one operation

 When compiling kernel source 'make -j $(nproc)' with the up-and-running
 KASAN-enabled kernel on a 256-core machine, the following soft lockup is
 shown:

 watchdog: BUG: soft lockup - CPU#28 stuck for 22s! [kworker/28:1:1760]
 CPU: 28 PID: 1760 Comm: kworker/28:1 Kdump: loaded Not tainted 6.10.0-rc5 #95
 Workqueue: events drain_vmap_area_work
 RIP: 0010:smp_call_function_many_cond+0x1d8/0xbb0
 Code: 38 c8 7c 08 84 c9 0f 85 49 08 00 00 8b 45 08 a8 01 74 2e 48 89 f1 49 89 f7 48 c1 e9 03 41 83 e7 07 4c 01 e9 41 83 c7 03 f3 90 <0f> b6 01 41 38 c7 7c 08 84 c0 0f 85 d4 06 00 00 8b 45 08 a8 01 75
 RSP: 0018:ffffc9000cb3fb60 EFLAGS: 00000202
 RAX: 0000000000000011 RBX: ffff8883bc4469c0 RCX: ffffed10776e9949
 RDX: 0000000000000002 RSI: ffff8883bb74ca48 RDI: ffffffff8434dc50
 RBP: ffff8883bb74ca40 R08: ffff888103585dc0 R09: ffff8884533a1800
 R10: 0000000000000004 R11: ffffffffffffffff R12: ffffed1077888d39
 R13: dffffc0000000000 R14: ffffed1077888d38 R15: 0000000000000003
 FS:  0000000000000000(0000) GS:ffff8883bc400000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00005577b5c8d158 CR3: 0000000004850000 CR4: 0000000000350ef0
 Call Trace:
  <IRQ>
  ? watchdog_timer_fn+0x2cd/0x390
  ? __pfx_watchdog_timer_fn+0x10/0x10
  ? __hrtimer_run_queues+0x300/0x6d0
  ? sched_clock_cpu+0x69/0x4e0
  ? __pfx___hrtimer_run_queues+0x10/0x10
  ? srso_return_thunk+0x5/0x5f
  ? ktime_get_update_offsets_now+0x7f/0x2a0
  ? srso_return_thunk+0x5/0x5f
  ? srso_return_thunk+0x5/0x5f
  ? hrtimer_interrupt+0x2ca/0x760
  ? __sysvec_apic_timer_interrupt+0x8c/0x2b0
  ? sysvec_apic_timer_interrupt+0x6a/0x90
  </IRQ>
  <TASK>
  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
  ? smp_call_function_many_cond+0x1d8/0xbb0
  ? __pfx_do_kernel_range_flush+0x10/0x10
  on_each_cpu_cond_mask+0x20/0x40
  flush_tlb_kernel_range+0x19b/0x250
  ? srso_return_thunk+0x5/0x5f
  ? kasan_release_vmalloc+0xa7/0xc0
  purge_vmap_node+0x357/0x820
  ? __pfx_purge_vmap_node+0x10/0x10
  __purge_vmap_area_lazy+0x5b8/0xa10
  drain_vmap_area_work+0x21/0x30
  process_one_work+0x661/0x10b0
  worker_thread+0x844/0x10e0
  ? srso_return_thunk+0x5/0x5f
  ? __kthread_parkme+0x82/0x140
  ? __pfx_worker_thread+0x10/0x10
  kthread+0x2a5/0x370
  ? __pfx_kthread+0x10/0x10
  ret_from_fork+0x30/0x70
  ? __pfx_kthread+0x10/0x10
  ret_from_fork_asm+0x1a/0x30
  </TASK>

 Debugging Analysis:

   1. The following ftrace log shows that the lockup CPU spends too much
      time iterating vmap_nodes and flushing TLB when purging vm_area
      structures. (Some info is trimmed).

      kworker: funcgraph_entry:              |  drain_vmap_area_work() {
      kworker: funcgraph_entry:              |   mutex_lock() {
      kworker: funcgraph_entry:  1.092 us    |     __cond_resched();
      kworker: funcgraph_exit:   3.306 us    |   }
      ...                                        ...
      kworker: funcgraph_entry:              |    flush_tlb_kernel_range() {
      ...                                          ...
      kworker: funcgraph_exit: # 7533.649 us |    }
      ...                                         ...
      kworker: funcgraph_entry:  2.344 us    |   mutex_unlock();
      kworker: funcgraph_exit: $ 23871554 us | }

      The drain_vmap_area_work() spends over 23 seconds.

      There are 2805 flush_tlb_kernel_range() calls in the ftrace log.
        * One is called in __purge_vmap_area_lazy().
        * Others are called by purge_vmap_node->kasan_release_vmalloc.
          purge_vmap_node() iteratively releases kasan vmalloc
          allocations and flushes TLB for each vmap_area.
            - [Rough calculation] Each flush_tlb_kernel_range() runs
              about 7.5ms.
                -- 2804 * 7.5ms = 21.03 seconds.
                -- That's why a soft lock is triggered.

   2. Extending the soft lockup time can work around the issue (For example,
      # echo 60 > /proc/sys/kernel/watchdog_thresh). This confirms the
      above-mentioned speculation: drain_vmap_area_work() spends too much
      time.

 If we combine all TLB flush operations of the KASAN shadow virtual
 address into one operation in the call path
 'purge_vmap_node()->kasan_release_vmalloc()', the running time of
 drain_vmap_area_work() can be saved greatly. The idea is from the
 flush_tlb_kernel_range() call in __purge_vmap_area_lazy(). And, the
 soft lockup won't be triggered.

 Here is the test result based on 6.10:

 [6.10 wo/ the patch]
   1. ftrace latency profiling (record a trace if the latency > 20s).
      echo 20000000 > /sys/kernel/debug/tracing/tracing_thresh
      echo drain_vmap_area_work > /sys/kernel/debug/tracing/set_graph_function
      echo function_graph > /sys/kernel/debug/tracing/current_tracer
      echo 1 > /sys/kernel/debug/tracing/tracing_on

   2. Run `make -j $(nproc)` to compile the kernel source

   3. Once the soft lockup is reproduced, check the ftrace log:
      cat /sys/kernel/debug/tracing/trace
         # tracer: function_graph
         #
         # CPU  DURATION                  FUNCTION CALLS
         # |     |   |                     |   |   |   |
           76) $ 50412985 us |    } /* __purge_vmap_area_lazy */
           76) $ 50412997 us |  } /* drain_vmap_area_work */
           76) $ 29165911 us |    } /* __purge_vmap_area_lazy */
           76) $ 29165926 us |  } /* drain_vmap_area_work */
           91) $ 53629423 us |    } /* __purge_vmap_area_lazy */
           91) $ 53629434 us |  } /* drain_vmap_area_work */
           91) $ 28121014 us |    } /* __purge_vmap_area_lazy */
           91) $ 28121026 us |  } /* drain_vmap_area_work */

 [6.10 w/ the patch]
   1. Repeat step 1-2 in "[6.10 wo/ the patch]"

   2. The soft lockup is not triggered and ftrace log is empty.
      cat /sys/kernel/debug/tracing/trace
      # tracer: function_graph
      #
      # CPU  DURATION                  FUNCTION CALLS
      # |     |   |                     |   |   |   |

   3. Setting 'tracing_thresh' to 10/5 seconds does not get any ftrace
      log.

   4. Setting 'tracing_thresh' to 1 second gets ftrace log.
      cat /sys/kernel/debug/tracing/trace
      # tracer: function_graph
      #
      # CPU  DURATION                  FUNCTION CALLS
      # |     |   |                     |   |   |   |
        23) $ 1074942 us  |    } /* __purge_vmap_area_lazy */
        23) $ 1074950 us  |  } /* drain_vmap_area_work */

   The worst execution time of drain_vmap_area_work() is about 1 second.

 The Linux kernel CVE team has assigned CVE-2024-56559 to this issue.


 Affected and fixed versions
 ===========================

 	Issue introduced in 6.9 with commit 282631cb2447318e2a55b41a665dbe8571c46d70 and fixed in 6.12.4 with commit f9a18889aad9b4c19c6c4550c67ad4f9ed2a354f
 	Issue introduced in 6.9 with commit 282631cb2447318e2a55b41a665dbe8571c46d70 and fixed in 6.13 with commit 9e9e085effe9b7e342138fde3cf8577d22509932

 Please see https://www.kernel.org for a full list of currently supported
 kernel versions by the kernel community.

 Unaffected versions might change over time as fixes are backported to
 older supported kernel versions.  The official CVE entry at
 	https://cve.org/CVERecord/?id=CVE-2024-56559
 will be updated if fixes are backported, please check that for the most
 up to date information about this issue.


 Affected files
 ==============

 The file(s) affected by this issue are:
 	include/linux/kasan.h
 	mm/kasan/shadow.c
 	mm/vmalloc.c


 Mitigation
 ==========

 The Linux kernel CVE team recommends that you update to the latest
 stable kernel version for this, and many other bugfixes.  Individual
 changes are never tested alone, but rather are part of a larger kernel
 release.  Cherry-picking individual commits is not recommended or
 supported by the Linux kernel community at all.  If however, updating to
 the latest release is impossible, the individual changes to resolve this
 issue can be found at these commits:
 	https://git.kernel.org/stable/c/f9a18889aad9b4c19c6c4550c67ad4f9ed2a354f
 	https://git.kernel.org/stable/c/9e9e085effe9b7e342138fde3cf8577d22509932
	From bippy-5f407fcff5a0 Mon Sep 17 00:00:00 2001
	From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
	To: <linux-cve-announce@vger.kernel.org>
	Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org>
	Subject: CVE-2024-56559: mm/vmalloc: combine all TLB flush operations of KASAN shadow virtual address into one operation

	Description
	===========

	In the Linux kernel, the following vulnerability has been resolved:

	mm/vmalloc: combine all TLB flush operations of KASAN shadow virtual address into one operation

	When compiling kernel source 'make -j $(nproc)' with the up-and-running
	KASAN-enabled kernel on a 256-core machine, the following soft lockup is
	shown:

	watchdog: BUG: soft lockup - CPU#28 stuck for 22s! [kworker/28:1:1760]
	CPU: 28 PID: 1760 Comm: kworker/28:1 Kdump: loaded Not tainted 6.10.0-rc5 #95
	Workqueue: events drain_vmap_area_work
	RIP: 0010:smp_call_function_many_cond+0x1d8/0xbb0
	Code: 38 c8 7c 08 84 c9 0f 85 49 08 00 00 8b 45 08 a8 01 74 2e 48 89 f1 49 89 f7 48 c1 e9 03 41 83 e7 07 4c 01 e9 41 83 c7 03 f3 90 <0f> b6 01 41 38 c7 7c 08 84 c0 0f 85 d4 06 00 00 8b 45 08 a8 01 75
	RSP: 0018:ffffc9000cb3fb60 EFLAGS: 00000202
	RAX: 0000000000000011 RBX: ffff8883bc4469c0 RCX: ffffed10776e9949
	RDX: 0000000000000002 RSI: ffff8883bb74ca48 RDI: ffffffff8434dc50
	RBP: ffff8883bb74ca40 R08: ffff888103585dc0 R09: ffff8884533a1800
	R10: 0000000000000004 R11: ffffffffffffffff R12: ffffed1077888d39
	R13: dffffc0000000000 R14: ffffed1077888d38 R15: 0000000000000003
	FS: 0000000000000000(0000) GS:ffff8883bc400000(0000) knlGS:0000000000000000
	CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	CR2: 00005577b5c8d158 CR3: 0000000004850000 CR4: 0000000000350ef0
	Call Trace:
	<IRQ>
	? watchdog_timer_fn+0x2cd/0x390
	? __pfx_watchdog_timer_fn+0x10/0x10
	? __hrtimer_run_queues+0x300/0x6d0
	? sched_clock_cpu+0x69/0x4e0
	? __pfx___hrtimer_run_queues+0x10/0x10
	? srso_return_thunk+0x5/0x5f
	? ktime_get_update_offsets_now+0x7f/0x2a0
	? srso_return_thunk+0x5/0x5f
	? srso_return_thunk+0x5/0x5f
	? hrtimer_interrupt+0x2ca/0x760
	? __sysvec_apic_timer_interrupt+0x8c/0x2b0
	? sysvec_apic_timer_interrupt+0x6a/0x90
	</IRQ>
	<TASK>
	? asm_sysvec_apic_timer_interrupt+0x16/0x20
	? smp_call_function_many_cond+0x1d8/0xbb0
	? __pfx_do_kernel_range_flush+0x10/0x10
	on_each_cpu_cond_mask+0x20/0x40
	flush_tlb_kernel_range+0x19b/0x250
	? srso_return_thunk+0x5/0x5f
	? kasan_release_vmalloc+0xa7/0xc0
	purge_vmap_node+0x357/0x820
	? __pfx_purge_vmap_node+0x10/0x10
	__purge_vmap_area_lazy+0x5b8/0xa10
	drain_vmap_area_work+0x21/0x30
	process_one_work+0x661/0x10b0
	worker_thread+0x844/0x10e0
	? srso_return_thunk+0x5/0x5f
	? __kthread_parkme+0x82/0x140
	? __pfx_worker_thread+0x10/0x10
	kthread+0x2a5/0x370
	? __pfx_kthread+0x10/0x10
	ret_from_fork+0x30/0x70
	? __pfx_kthread+0x10/0x10
	ret_from_fork_asm+0x1a/0x30
	</TASK>

	Debugging Analysis:

	1. The following ftrace log shows that the lockup CPU spends too much
	time iterating vmap_nodes and flushing TLB when purging vm_area
	structures. (Some info is trimmed).

	kworker: funcgraph_entry: \| drain_vmap_area_work() {
	kworker: funcgraph_entry: \| mutex_lock() {
	kworker: funcgraph_entry: 1.092 us \| __cond_resched();
	kworker: funcgraph_exit: 3.306 us \| }
	... ...
	kworker: funcgraph_entry: \| flush_tlb_kernel_range() {
	... ...
	kworker: funcgraph_exit: # 7533.649 us \| }
	... ...
	kworker: funcgraph_entry: 2.344 us \| mutex_unlock();
	kworker: funcgraph_exit: $ 23871554 us \| }

	The drain_vmap_area_work() spends over 23 seconds.

	There are 2805 flush_tlb_kernel_range() calls in the ftrace log.
	* One is called in __purge_vmap_area_lazy().
	* Others are called by purge_vmap_node->kasan_release_vmalloc.
	purge_vmap_node() iteratively releases kasan vmalloc
	allocations and flushes TLB for each vmap_area.
	- [Rough calculation] Each flush_tlb_kernel_range() runs
	about 7.5ms.
	-- 2804 * 7.5ms = 21.03 seconds.
	-- That's why a soft lock is triggered.

	2. Extending the soft lockup time can work around the issue (For example,
	# echo 60 > /proc/sys/kernel/watchdog_thresh). This confirms the
	above-mentioned speculation: drain_vmap_area_work() spends too much
	time.

	If we combine all TLB flush operations of the KASAN shadow virtual
	address into one operation in the call path
	'purge_vmap_node()->kasan_release_vmalloc()', the running time of
	drain_vmap_area_work() can be saved greatly. The idea is from the
	flush_tlb_kernel_range() call in __purge_vmap_area_lazy(). And, the
	soft lockup won't be triggered.

	Here is the test result based on 6.10:

	[6.10 wo/ the patch]
	1. ftrace latency profiling (record a trace if the latency > 20s).
	echo 20000000 > /sys/kernel/debug/tracing/tracing_thresh
	echo drain_vmap_area_work > /sys/kernel/debug/tracing/set_graph_function
	echo function_graph > /sys/kernel/debug/tracing/current_tracer
	echo 1 > /sys/kernel/debug/tracing/tracing_on

	2. Run `make -j $(nproc)` to compile the kernel source

	3. Once the soft lockup is reproduced, check the ftrace log:
	cat /sys/kernel/debug/tracing/trace
	# tracer: function_graph
	#
	# CPU DURATION FUNCTION CALLS
	# \| \| \| \| \| \| \|
	76) $ 50412985 us \| } /* __purge_vmap_area_lazy */
	76) $ 50412997 us \| } /* drain_vmap_area_work */
	76) $ 29165911 us \| } /* __purge_vmap_area_lazy */
	76) $ 29165926 us \| } /* drain_vmap_area_work */
	91) $ 53629423 us \| } /* __purge_vmap_area_lazy */
	91) $ 53629434 us \| } /* drain_vmap_area_work */
	91) $ 28121014 us \| } /* __purge_vmap_area_lazy */
	91) $ 28121026 us \| } /* drain_vmap_area_work */

	[6.10 w/ the patch]
	1. Repeat step 1-2 in "[6.10 wo/ the patch]"

	2. The soft lockup is not triggered and ftrace log is empty.
	cat /sys/kernel/debug/tracing/trace
	# tracer: function_graph
	#
	# CPU DURATION FUNCTION CALLS
	# \| \| \| \| \| \| \|

	3. Setting 'tracing_thresh' to 10/5 seconds does not get any ftrace
	log.

	4. Setting 'tracing_thresh' to 1 second gets ftrace log.
	cat /sys/kernel/debug/tracing/trace
	# tracer: function_graph
	#
	# CPU DURATION FUNCTION CALLS
	# \| \| \| \| \| \| \|
	23) $ 1074942 us \| } /* __purge_vmap_area_lazy */
	23) $ 1074950 us \| } /* drain_vmap_area_work */

	The worst execution time of drain_vmap_area_work() is about 1 second.

	The Linux kernel CVE team has assigned CVE-2024-56559 to this issue.


	Affected and fixed versions
	===========================

	Issue introduced in 6.9 with commit 282631cb2447318e2a55b41a665dbe8571c46d70 and fixed in 6.12.4 with commit f9a18889aad9b4c19c6c4550c67ad4f9ed2a354f
	Issue introduced in 6.9 with commit 282631cb2447318e2a55b41a665dbe8571c46d70 and fixed in 6.13 with commit 9e9e085effe9b7e342138fde3cf8577d22509932

	Please see https://www.kernel.org for a full list of currently supported
	kernel versions by the kernel community.

	Unaffected versions might change over time as fixes are backported to
	older supported kernel versions. The official CVE entry at
	https://cve.org/CVERecord/?id=CVE-2024-56559
	will be updated if fixes are backported, please check that for the most
	up to date information about this issue.


	Affected files
	==============

	The file(s) affected by this issue are:
	include/linux/kasan.h
	mm/kasan/shadow.c
	mm/vmalloc.c


	Mitigation
	==========

	The Linux kernel CVE team recommends that you update to the latest
	stable kernel version for this, and many other bugfixes. Individual
	changes are never tested alone, but rather are part of a larger kernel
	release. Cherry-picking individual commits is not recommended or
	supported by the Linux kernel community at all. If however, updating to
	the latest release is impossible, the individual changes to resolve this
	issue can be found at these commits:
	https://git.kernel.org/stable/c/f9a18889aad9b4c19c6c4550c67ad4f9ed2a354f
	https://git.kernel.org/stable/c/9e9e085effe9b7e342138fde3cf8577d22509932