cve/published/2022/CVE-2022-49781.mbox - pub/scm/linux/security/vulns - Git at Google

 From bippy-1.1.0 Mon Sep 17 00:00:00 2001
 From: Greg Kroah-Hartman <gregkh@kernel.org>
 To: <linux-cve-announce@vger.kernel.org>
 Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org>
 Subject: CVE-2022-49781: perf/x86/amd: Fix crash due to race between amd_pmu_enable_all, perf NMI and throttling

 Description
 ===========

 In the Linux kernel, the following vulnerability has been resolved:

 perf/x86/amd: Fix crash due to race between amd_pmu_enable_all, perf NMI and throttling

 amd_pmu_enable_all() does:

       if (!test_bit(idx, cpuc->active_mask))
               continue;

       amd_pmu_enable_event(cpuc->events[idx]);

 A perf NMI of another event can come between these two steps. Perf NMI
 handler internally disables and enables _all_ events, including the one
 which nmi-intercepted amd_pmu_enable_all() was in process of enabling.
 If that unintentionally enabled event has very low sampling period and
 causes immediate successive NMI, causing the event to be throttled,
 cpuc->events[idx] and cpuc->active_mask gets cleared by x86_pmu_stop().
 This will result in amd_pmu_enable_event() getting called with event=NULL
 when amd_pmu_enable_all() resumes after handling the NMIs. This causes a
 kernel crash:

   BUG: kernel NULL pointer dereference, address: 0000000000000198
   #PF: supervisor read access in kernel mode
   #PF: error_code(0x0000) - not-present page
   [...]
   Call Trace:
    <TASK>
    amd_pmu_enable_all+0x68/0xb0
    ctx_resched+0xd9/0x150
    event_function+0xb8/0x130
    ? hrtimer_start_range_ns+0x141/0x4a0
    ? perf_duration_warn+0x30/0x30
    remote_function+0x4d/0x60
    __flush_smp_call_function_queue+0xc4/0x500
    flush_smp_call_function_queue+0x11d/0x1b0
    do_idle+0x18f/0x2d0
    cpu_startup_entry+0x19/0x20
    start_secondary+0x121/0x160
    secondary_startup_64_no_verify+0xe5/0xeb
    </TASK>

 amd_pmu_disable_all()/amd_pmu_enable_all() calls inside perf NMI handler
 were recently added as part of BRS enablement but I'm not sure whether
 we really need them. We can just disable BRS in the beginning and enable
 it back while returning from NMI. This will solve the issue by not
 enabling those events whose active_masks are set but are not yet enabled
 in hw pmu.

 The Linux kernel CVE team has assigned CVE-2022-49781 to this issue.


 Affected and fixed versions
 ===========================

 	Issue introduced in 5.19 with commit ada543459cab7f653dcacdaba4011a8bb19c627c and fixed in 6.0.10 with commit fd5e454b856ed86b090336e269695d9908609b71
 	Issue introduced in 5.19 with commit ada543459cab7f653dcacdaba4011a8bb19c627c and fixed in 6.1 with commit baa014b9543c8e5e94f5d15b66abfe60750b8284

 Please see https://www.kernel.org for a full list of currently supported
 kernel versions by the kernel community.

 Unaffected versions might change over time as fixes are backported to
 older supported kernel versions.  The official CVE entry at
 	https://cve.org/CVERecord/?id=CVE-2022-49781
 will be updated if fixes are backported, please check that for the most
 up to date information about this issue.


 Affected files
 ==============

 The file(s) affected by this issue are:
 	arch/x86/events/amd/core.c


 Mitigation
 ==========

 The Linux kernel CVE team recommends that you update to the latest
 stable kernel version for this, and many other bugfixes.  Individual
 changes are never tested alone, but rather are part of a larger kernel
 release.  Cherry-picking individual commits is not recommended or
 supported by the Linux kernel community at all.  If however, updating to
 the latest release is impossible, the individual changes to resolve this
 issue can be found at these commits:
 	https://git.kernel.org/stable/c/fd5e454b856ed86b090336e269695d9908609b71
 	https://git.kernel.org/stable/c/baa014b9543c8e5e94f5d15b66abfe60750b8284
	From bippy-1.1.0 Mon Sep 17 00:00:00 2001
	From: Greg Kroah-Hartman <gregkh@kernel.org>
	To: <linux-cve-announce@vger.kernel.org>
	Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org>
	Subject: CVE-2022-49781: perf/x86/amd: Fix crash due to race between amd_pmu_enable_all, perf NMI and throttling

	Description
	===========

	In the Linux kernel, the following vulnerability has been resolved:

	perf/x86/amd: Fix crash due to race between amd_pmu_enable_all, perf NMI and throttling

	amd_pmu_enable_all() does:

	if (!test_bit(idx, cpuc->active_mask))
	continue;

	amd_pmu_enable_event(cpuc->events[idx]);

	A perf NMI of another event can come between these two steps. Perf NMI
	handler internally disables and enables _all_ events, including the one
	which nmi-intercepted amd_pmu_enable_all() was in process of enabling.
	If that unintentionally enabled event has very low sampling period and
	causes immediate successive NMI, causing the event to be throttled,
	cpuc->events[idx] and cpuc->active_mask gets cleared by x86_pmu_stop().
	This will result in amd_pmu_enable_event() getting called with event=NULL
	when amd_pmu_enable_all() resumes after handling the NMIs. This causes a
	kernel crash:

	BUG: kernel NULL pointer dereference, address: 0000000000000198
	#PF: supervisor read access in kernel mode
	#PF: error_code(0x0000) - not-present page
	[...]
	Call Trace:
	<TASK>
	amd_pmu_enable_all+0x68/0xb0
	ctx_resched+0xd9/0x150
	event_function+0xb8/0x130
	? hrtimer_start_range_ns+0x141/0x4a0
	? perf_duration_warn+0x30/0x30
	remote_function+0x4d/0x60
	__flush_smp_call_function_queue+0xc4/0x500
	flush_smp_call_function_queue+0x11d/0x1b0
	do_idle+0x18f/0x2d0
	cpu_startup_entry+0x19/0x20
	start_secondary+0x121/0x160
	secondary_startup_64_no_verify+0xe5/0xeb
	</TASK>

	amd_pmu_disable_all()/amd_pmu_enable_all() calls inside perf NMI handler
	were recently added as part of BRS enablement but I'm not sure whether
	we really need them. We can just disable BRS in the beginning and enable
	it back while returning from NMI. This will solve the issue by not
	enabling those events whose active_masks are set but are not yet enabled
	in hw pmu.

	The Linux kernel CVE team has assigned CVE-2022-49781 to this issue.


	Affected and fixed versions
	===========================

	Issue introduced in 5.19 with commit ada543459cab7f653dcacdaba4011a8bb19c627c and fixed in 6.0.10 with commit fd5e454b856ed86b090336e269695d9908609b71
	Issue introduced in 5.19 with commit ada543459cab7f653dcacdaba4011a8bb19c627c and fixed in 6.1 with commit baa014b9543c8e5e94f5d15b66abfe60750b8284

	Please see https://www.kernel.org for a full list of currently supported
	kernel versions by the kernel community.

	Unaffected versions might change over time as fixes are backported to
	older supported kernel versions. The official CVE entry at
	https://cve.org/CVERecord/?id=CVE-2022-49781
	will be updated if fixes are backported, please check that for the most
	up to date information about this issue.


	Affected files
	==============

	The file(s) affected by this issue are:
	arch/x86/events/amd/core.c


	Mitigation
	==========

	The Linux kernel CVE team recommends that you update to the latest
	stable kernel version for this, and many other bugfixes. Individual
	changes are never tested alone, but rather are part of a larger kernel
	release. Cherry-picking individual commits is not recommended or
	supported by the Linux kernel community at all. If however, updating to
	the latest release is impossible, the individual changes to resolve this
	issue can be found at these commits:
	https://git.kernel.org/stable/c/fd5e454b856ed86b090336e269695d9908609b71
	https://git.kernel.org/stable/c/baa014b9543c8e5e94f5d15b66abfe60750b8284