| From bippy-1.2.0 Mon Sep 17 00:00:00 2001 |
| From: Greg Kroah-Hartman <gregkh@kernel.org> |
| To: <linux-cve-announce@vger.kernel.org> |
| Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org> |
| Subject: CVE-2025-22022: usb: xhci: Apply the link chain quirk on NEC isoc endpoints |
| |
| Description |
| =========== |
| |
| In the Linux kernel, the following vulnerability has been resolved: |
| |
| usb: xhci: Apply the link chain quirk on NEC isoc endpoints |
| |
| Two clearly different specimens of NEC uPD720200 (one with start/stop |
| bug, one without) were seen to cause IOMMU faults after some Missed |
| Service Errors. Faulting address is immediately after a transfer ring |
| segment and patched dynamic debug messages revealed that the MSE was |
| received when waiting for a TD near the end of that segment: |
| |
| [ 1.041954] xhci_hcd: Miss service interval error for slot 1 ep 2 expected TD DMA ffa08fe0 |
| [ 1.042120] xhci_hcd: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xffa09000 flags=0x0000] |
| [ 1.042146] xhci_hcd: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xffa09040 flags=0x0000] |
| |
| It gets even funnier if the next page is a ring segment accessible to |
| the HC. Below, it reports MSE in segment at ff1e8000, plows through a |
| zero-filled page at ff1e9000 and starts reporting events for TRBs in |
| page at ff1ea000 every microframe, instead of jumping to seg ff1e6000. |
| |
| [ 7.041671] xhci_hcd: Miss service interval error for slot 1 ep 2 expected TD DMA ff1e8fe0 |
| [ 7.041999] xhci_hcd: Miss service interval error for slot 1 ep 2 expected TD DMA ff1e8fe0 |
| [ 7.042011] xhci_hcd: WARN: buffer overrun event for slot 1 ep 2 on endpoint |
| [ 7.042028] xhci_hcd: All TDs skipped for slot 1 ep 2. Clear skip flag. |
| [ 7.042134] xhci_hcd: WARN: buffer overrun event for slot 1 ep 2 on endpoint |
| [ 7.042138] xhci_hcd: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 31 |
| [ 7.042144] xhci_hcd: Looking for event-dma 00000000ff1ea040 trb-start 00000000ff1e6820 trb-end 00000000ff1e6820 |
| [ 7.042259] xhci_hcd: WARN: buffer overrun event for slot 1 ep 2 on endpoint |
| [ 7.042262] xhci_hcd: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 31 |
| [ 7.042266] xhci_hcd: Looking for event-dma 00000000ff1ea050 trb-start 00000000ff1e6820 trb-end 00000000ff1e6820 |
| |
| At some point completion events change from Isoch Buffer Overrun to |
| Short Packet and the HC finally finds cycle bit mismatch in ff1ec000. |
| |
| [ 7.098130] xhci_hcd: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13 |
| [ 7.098132] xhci_hcd: Looking for event-dma 00000000ff1ecc50 trb-start 00000000ff1e6820 trb-end 00000000ff1e6820 |
| [ 7.098254] xhci_hcd: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13 |
| [ 7.098256] xhci_hcd: Looking for event-dma 00000000ff1ecc60 trb-start 00000000ff1e6820 trb-end 00000000ff1e6820 |
| [ 7.098379] xhci_hcd: Overrun event on slot 1 ep 2 |
| |
| It's possible that data from the isochronous device were written to |
| random buffers of pending TDs on other endpoints (either IN or OUT), |
| other devices or even other HCs in the same IOMMU domain. |
| |
| Lastly, an error from a different USB device on another HC. Was it |
| caused by the above? I don't know, but it may have been. The disk |
| was working without any other issues and generated PCIe traffic to |
| starve the NEC of upstream BW and trigger those MSEs. The two HCs |
| shared one x1 slot by means of a commercial "PCIe splitter" board. |
| |
| [ 7.162604] usb 10-2: reset SuperSpeed USB device number 3 using xhci_hcd |
| [ 7.178990] sd 9:0:0:0: [sdb] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=DRIVER_OK cmd_age=0s |
| [ 7.179001] sd 9:0:0:0: [sdb] tag#0 CDB: opcode=0x28 28 00 04 02 ae 00 00 02 00 00 |
| [ 7.179004] I/O error, dev sdb, sector 67284480 op 0x0:(READ) flags 0x80700 phys_seg 5 prio class 0 |
| |
| Fortunately, it appears that this ridiculous bug is avoided by setting |
| the chain bit of Link TRBs on isochronous rings. Other ancient HCs are |
| known which also expect the bit to be set and they ignore Link TRBs if |
| it's not. Reportedly, 0.95 spec guaranteed that the bit is set. |
| |
| The bandwidth-starved NEC HC running a 32KB/uframe UVC endpoint reports |
| tens of MSEs per second and runs into the bug within seconds. Chaining |
| Link TRBs allows the same workload to run for many minutes, many times. |
| |
| No negative side effects seen in UVC recording and UAC playback with a |
| few devices at full speed, high speed and SuperSpeed. |
| |
| The problem doesn't reproduce on the newer Renesas uPD720201/uPD720202 |
| and on old Etron EJ168 and VIA VL805 (but the VL805 has other bug). |
| |
| [shorten line length of log snippets in commit messge -Mathias] |
| |
| The Linux kernel CVE team has assigned CVE-2025-22022 to this issue. |
| |
| |
| Affected and fixed versions |
| =========================== |
| |
| Fixed in 6.12.22 with commit a4931d9fb99eb5462f3eaa231999d279c40afb21 |
| Fixed in 6.13.10 with commit 43a18225150ce874d23b37761c302a5dffee1595 |
| Fixed in 6.14.1 with commit 061a1683bae6ef56ab8fa392725ba7495515cd1d |
| Fixed in 6.15 with commit bb0ba4cb1065e87f9cc75db1fa454e56d0894d01 |
| |
| Please see https://www.kernel.org for a full list of currently supported |
| kernel versions by the kernel community. |
| |
| Unaffected versions might change over time as fixes are backported to |
| older supported kernel versions. The official CVE entry at |
| https://cve.org/CVERecord/?id=CVE-2025-22022 |
| will be updated if fixes are backported, please check that for the most |
| up to date information about this issue. |
| |
| |
| Affected files |
| ============== |
| |
| The file(s) affected by this issue are: |
| drivers/usb/host/xhci.h |
| |
| |
| Mitigation |
| ========== |
| |
| The Linux kernel CVE team recommends that you update to the latest |
| stable kernel version for this, and many other bugfixes. Individual |
| changes are never tested alone, but rather are part of a larger kernel |
| release. Cherry-picking individual commits is not recommended or |
| supported by the Linux kernel community at all. If however, updating to |
| the latest release is impossible, the individual changes to resolve this |
| issue can be found at these commits: |
| https://git.kernel.org/stable/c/a4931d9fb99eb5462f3eaa231999d279c40afb21 |
| https://git.kernel.org/stable/c/43a18225150ce874d23b37761c302a5dffee1595 |
| https://git.kernel.org/stable/c/061a1683bae6ef56ab8fa392725ba7495515cd1d |
| https://git.kernel.org/stable/c/bb0ba4cb1065e87f9cc75db1fa454e56d0894d01 |