| From bippy-5f407fcff5a0 Mon Sep 17 00:00:00 2001 |
| From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| To: <linux-cve-announce@vger.kernel.org> |
| Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org> |
| Subject: CVE-2022-48916: iommu/vt-d: Fix double list_add when enabling VMD in scalable mode |
| |
| Description |
| =========== |
| |
| In the Linux kernel, the following vulnerability has been resolved: |
| |
| iommu/vt-d: Fix double list_add when enabling VMD in scalable mode |
| |
| When enabling VMD and IOMMU scalable mode, the following kernel panic |
| call trace/kernel log is shown in Eagle Stream platform (Sapphire Rapids |
| CPU) during booting: |
| |
| pci 0000:59:00.5: Adding to iommu group 42 |
| ... |
| vmd 0000:59:00.5: PCI host bridge to bus 10000:80 |
| pci 10000:80:01.0: [8086:352a] type 01 class 0x060400 |
| pci 10000:80:01.0: reg 0x10: [mem 0x00000000-0x0001ffff 64bit] |
| pci 10000:80:01.0: enabling Extended Tags |
| pci 10000:80:01.0: PME# supported from D0 D3hot D3cold |
| pci 10000:80:01.0: DMAR: Setup RID2PASID failed |
| pci 10000:80:01.0: Failed to add to iommu group 42: -16 |
| pci 10000:80:03.0: [8086:352b] type 01 class 0x060400 |
| pci 10000:80:03.0: reg 0x10: [mem 0x00000000-0x0001ffff 64bit] |
| pci 10000:80:03.0: enabling Extended Tags |
| pci 10000:80:03.0: PME# supported from D0 D3hot D3cold |
| ------------[ cut here ]------------ |
| kernel BUG at lib/list_debug.c:29! |
| invalid opcode: 0000 [#1] PREEMPT SMP NOPTI |
| CPU: 0 PID: 7 Comm: kworker/0:1 Not tainted 5.17.0-rc3+ #7 |
| Hardware name: Lenovo ThinkSystem SR650V3/SB27A86647, BIOS ESE101Y-1.00 01/13/2022 |
| Workqueue: events work_for_cpu_fn |
| RIP: 0010:__list_add_valid.cold+0x26/0x3f |
| Code: 9a 4a ab ff 4c 89 c1 48 c7 c7 40 0c d9 9e e8 b9 b1 fe ff 0f |
| 0b 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 f0 0c d9 9e e8 a2 b1 |
| fe ff <0f> 0b 48 89 d1 4c 89 c6 4c 89 ca 48 c7 c7 98 0c d9 |
| 9e e8 8b b1 fe |
| RSP: 0000:ff5ad434865b3a40 EFLAGS: 00010246 |
| RAX: 0000000000000058 RBX: ff4d61160b74b880 RCX: ff4d61255e1fffa8 |
| RDX: 0000000000000000 RSI: 00000000fffeffff RDI: ffffffff9fd34f20 |
| RBP: ff4d611d8e245c00 R08: 0000000000000000 R09: ff5ad434865b3888 |
| R10: ff5ad434865b3880 R11: ff4d61257fdc6fe8 R12: ff4d61160b74b8a0 |
| R13: ff4d61160b74b8a0 R14: ff4d611d8e245c10 R15: ff4d611d8001ba70 |
| FS: 0000000000000000(0000) GS:ff4d611d5ea00000(0000) knlGS:0000000000000000 |
| CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 |
| CR2: ff4d611fa1401000 CR3: 0000000aa0210001 CR4: 0000000000771ef0 |
| DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 |
| DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 |
| PKRU: 55555554 |
| Call Trace: |
| <TASK> |
| intel_pasid_alloc_table+0x9c/0x1d0 |
| dmar_insert_one_dev_info+0x423/0x540 |
| ? device_to_iommu+0x12d/0x2f0 |
| intel_iommu_attach_device+0x116/0x290 |
| __iommu_attach_device+0x1a/0x90 |
| iommu_group_add_device+0x190/0x2c0 |
| __iommu_probe_device+0x13e/0x250 |
| iommu_probe_device+0x24/0x150 |
| iommu_bus_notifier+0x69/0x90 |
| blocking_notifier_call_chain+0x5a/0x80 |
| device_add+0x3db/0x7b0 |
| ? arch_memremap_can_ram_remap+0x19/0x50 |
| ? memremap+0x75/0x140 |
| pci_device_add+0x193/0x1d0 |
| pci_scan_single_device+0xb9/0xf0 |
| pci_scan_slot+0x4c/0x110 |
| pci_scan_child_bus_extend+0x3a/0x290 |
| vmd_enable_domain.constprop.0+0x63e/0x820 |
| vmd_probe+0x163/0x190 |
| local_pci_probe+0x42/0x80 |
| work_for_cpu_fn+0x13/0x20 |
| process_one_work+0x1e2/0x3b0 |
| worker_thread+0x1c4/0x3a0 |
| ? rescuer_thread+0x370/0x370 |
| kthread+0xc7/0xf0 |
| ? kthread_complete_and_exit+0x20/0x20 |
| ret_from_fork+0x1f/0x30 |
| </TASK> |
| Modules linked in: |
| ---[ end trace 0000000000000000 ]--- |
| ... |
| Kernel panic - not syncing: Fatal exception |
| Kernel Offset: 0x1ca00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) |
| ---[ end Kernel panic - not syncing: Fatal exception ]--- |
| |
| The following 'lspci' output shows devices '10000:80:*' are subdevices of |
| the VMD device 0000:59:00.5: |
| |
| $ lspci |
| ... |
| 0000:59:00.5 RAID bus controller: Intel Corporation Volume Management Device NVMe RAID Controller (rev 20) |
| ... |
| 10000:80:01.0 PCI bridge: Intel Corporation Device 352a (rev 03) |
| 10000:80:03.0 PCI bridge: Intel Corporation Device 352b (rev 03) |
| 10000:80:05.0 PCI bridge: Intel Corporation Device 352c (rev 03) |
| 10000:80:07.0 PCI bridge: Intel Corporation Device 352d (rev 03) |
| 10000:81:00.0 Non-Volatile memory controller: Intel Corporation NVMe Datacenter SSD [3DNAND, Beta Rock Controller] |
| 10000:82:00.0 Non-Volatile memory controller: Intel Corporation NVMe Datacenter SSD [3DNAND, Beta Rock Controller] |
| |
| The symptom 'list_add double add' is caused by the following failure |
| message: |
| |
| pci 10000:80:01.0: DMAR: Setup RID2PASID failed |
| pci 10000:80:01.0: Failed to add to iommu group 42: -16 |
| pci 10000:80:03.0: [8086:352b] type 01 class 0x060400 |
| |
| Device 10000:80:01.0 is the subdevice of the VMD device 0000:59:00.5, |
| so invoking intel_pasid_alloc_table() gets the pasid_table of the VMD |
| device 0000:59:00.5. Here is call path: |
| |
| intel_pasid_alloc_table |
| pci_for_each_dma_alias |
| get_alias_pasid_table |
| search_pasid_table |
| |
| pci_real_dma_dev() in pci_for_each_dma_alias() gets the real dma device |
| which is the VMD device 0000:59:00.5. However, pte of the VMD device |
| 0000:59:00.5 has been configured during this message "pci 0000:59:00.5: |
| Adding to iommu group 42". So, the status -EBUSY is returned when |
| configuring pasid entry for device 10000:80:01.0. |
| |
| It then invokes dmar_remove_one_dev_info() to release |
| 'struct device_domain_info *' from iommu_devinfo_cache. But, the pasid |
| table is not released because of the following statement in |
| __dmar_remove_one_dev_info(): |
| |
| if (info->dev && !dev_is_real_dma_subdevice(info->dev)) { |
| ... |
| intel_pasid_free_table(info->dev); |
| } |
| |
| The subsequent dmar_insert_one_dev_info() operation of device |
| 10000:80:03.0 allocates 'struct device_domain_info *' from |
| iommu_devinfo_cache. The allocated address is the same address that |
| is released previously for device 10000:80:01.0. Finally, invoking |
| device_attach_pasid_table() causes the issue. |
| |
| `git bisect` points to the offending commit 474dd1c65064 ("iommu/vt-d: |
| Fix clearing real DMA device's scalable-mode context entries"), which |
| releases the pasid table if the device is not the subdevice by |
| checking the returned status of dev_is_real_dma_subdevice(). |
| Reverting the offending commit can work around the issue. |
| |
| The solution is to prevent from allocating pasid table if those |
| devices are subdevices of the VMD device. |
| |
| The Linux kernel CVE team has assigned CVE-2022-48916 to this issue. |
| |
| |
| Affected and fixed versions |
| =========================== |
| |
| Issue introduced in 5.14 with commit 474dd1c6506411752a9b2f2233eec11f1733a099 and fixed in 5.15.27 with commit 2aaa085bd012a83be7104356301828585a2253ed |
| Issue introduced in 5.14 with commit 474dd1c6506411752a9b2f2233eec11f1733a099 and fixed in 5.16.13 with commit d5ad4214d9c6c6e465c192789020a091282dfee7 |
| Issue introduced in 5.14 with commit 474dd1c6506411752a9b2f2233eec11f1733a099 and fixed in 5.17 with commit b00833768e170a31af09268f7ab96aecfcca9623 |
| Issue introduced in 5.12.19 with commit 77c6a77a068c2304e3f19abee67b0c76dde4c0ea |
| Issue introduced in 5.13.4 with commit 7e6a4c304debd9be7cacecbe99b3481985e0c885 |
| |
| Please see https://www.kernel.org for a full list of currently supported |
| kernel versions by the kernel community. |
| |
| Unaffected versions might change over time as fixes are backported to |
| older supported kernel versions. The official CVE entry at |
| https://cve.org/CVERecord/?id=CVE-2022-48916 |
| will be updated if fixes are backported, please check that for the most |
| up to date information about this issue. |
| |
| |
| Affected files |
| ============== |
| |
| The file(s) affected by this issue are: |
| drivers/iommu/intel/iommu.c |
| |
| |
| Mitigation |
| ========== |
| |
| The Linux kernel CVE team recommends that you update to the latest |
| stable kernel version for this, and many other bugfixes. Individual |
| changes are never tested alone, but rather are part of a larger kernel |
| release. Cherry-picking individual commits is not recommended or |
| supported by the Linux kernel community at all. If however, updating to |
| the latest release is impossible, the individual changes to resolve this |
| issue can be found at these commits: |
| https://git.kernel.org/stable/c/2aaa085bd012a83be7104356301828585a2253ed |
| https://git.kernel.org/stable/c/d5ad4214d9c6c6e465c192789020a091282dfee7 |
| https://git.kernel.org/stable/c/b00833768e170a31af09268f7ab96aecfcca9623 |