| From bippy-5f407fcff5a0 Mon Sep 17 00:00:00 2001 |
| From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| To: <linux-cve-announce@vger.kernel.org> |
| Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org> |
| Subject: CVE-2023-52587: IB/ipoib: Fix mcast list locking |
| |
| Description |
| =========== |
| |
| In the Linux kernel, the following vulnerability has been resolved: |
| |
| IB/ipoib: Fix mcast list locking |
| |
| Releasing the `priv->lock` while iterating the `priv->multicast_list` in |
| `ipoib_mcast_join_task()` opens a window for `ipoib_mcast_dev_flush()` to |
| remove the items while in the middle of iteration. If the mcast is removed |
| while the lock was dropped, the for loop spins forever resulting in a hard |
| lockup (as was reported on RHEL 4.18.0-372.75.1.el8_6 kernel): |
| |
| Task A (kworker/u72:2 below) | Task B (kworker/u72:0 below) |
| -----------------------------------+----------------------------------- |
| ipoib_mcast_join_task(work) | ipoib_ib_dev_flush_light(work) |
| spin_lock_irq(&priv->lock) | __ipoib_ib_dev_flush(priv, ...) |
| list_for_each_entry(mcast, | ipoib_mcast_dev_flush(dev = priv->dev) |
| &priv->multicast_list, list) | |
| ipoib_mcast_join(dev, mcast) | |
| spin_unlock_irq(&priv->lock) | |
| | spin_lock_irqsave(&priv->lock, flags) |
| | list_for_each_entry_safe(mcast, tmcast, |
| | &priv->multicast_list, list) |
| | list_del(&mcast->list); |
| | list_add_tail(&mcast->list, &remove_list) |
| | spin_unlock_irqrestore(&priv->lock, flags) |
| spin_lock_irq(&priv->lock) | |
| | ipoib_mcast_remove_list(&remove_list) |
| (Here, `mcast` is no longer on the | list_for_each_entry_safe(mcast, tmcast, |
| `priv->multicast_list` and we keep | remove_list, list) |
| spinning on the `remove_list` of | >>> wait_for_completion(&mcast->done) |
| the other thread which is blocked | |
| and the list is still valid on | |
| it's stack.) |
| |
| Fix this by keeping the lock held and changing to GFP_ATOMIC to prevent |
| eventual sleeps. |
| Unfortunately we could not reproduce the lockup and confirm this fix but |
| based on the code review I think this fix should address such lockups. |
| |
| crash> bc 31 |
| PID: 747 TASK: ff1c6a1a007e8000 CPU: 31 COMMAND: "kworker/u72:2" |
| -- |
| [exception RIP: ipoib_mcast_join_task+0x1b1] |
| RIP: ffffffffc0944ac1 RSP: ff646f199a8c7e00 RFLAGS: 00000002 |
| RAX: 0000000000000000 RBX: ff1c6a1a04dc82f8 RCX: 0000000000000000 |
| work (&priv->mcast_task{,.work}) |
| RDX: ff1c6a192d60ac68 RSI: 0000000000000286 RDI: ff1c6a1a04dc8000 |
| &mcast->list |
| RBP: ff646f199a8c7e90 R8: ff1c699980019420 R9: ff1c6a1920c9a000 |
| R10: ff646f199a8c7e00 R11: ff1c6a191a7d9800 R12: ff1c6a192d60ac00 |
| mcast |
| R13: ff1c6a1d82200000 R14: ff1c6a1a04dc8000 R15: ff1c6a1a04dc82d8 |
| dev priv (&priv->lock) &priv->multicast_list (aka head) |
| ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 |
| --- <NMI exception stack> --- |
| #5 [ff646f199a8c7e00] ipoib_mcast_join_task+0x1b1 at ffffffffc0944ac1 [ib_ipoib] |
| #6 [ff646f199a8c7e98] process_one_work+0x1a7 at ffffffff9bf10967 |
| |
| crash> rx ff646f199a8c7e68 |
| ff646f199a8c7e68: ff1c6a1a04dc82f8 <<< work = &priv->mcast_task.work |
| |
| crash> list -hO ipoib_dev_priv.multicast_list ff1c6a1a04dc8000 |
| (empty) |
| |
| crash> ipoib_dev_priv.mcast_task.work.func,mcast_mutex.owner.counter ff1c6a1a04dc8000 |
| mcast_task.work.func = 0xffffffffc0944910 <ipoib_mcast_join_task>, |
| mcast_mutex.owner.counter = 0xff1c69998efec000 |
| |
| crash> b 8 |
| PID: 8 TASK: ff1c69998efec000 CPU: 33 COMMAND: "kworker/u72:0" |
| -- |
| #3 [ff646f1980153d50] wait_for_completion+0x96 at ffffffff9c7d7646 |
| #4 [ff646f1980153d90] ipoib_mcast_remove_list+0x56 at ffffffffc0944dc6 [ib_ipoib] |
| #5 [ff646f1980153de8] ipoib_mcast_dev_flush+0x1a7 at ffffffffc09455a7 [ib_ipoib] |
| #6 [ff646f1980153e58] __ipoib_ib_dev_flush+0x1a4 at ffffffffc09431a4 [ib_ipoib] |
| #7 [ff646f1980153e98] process_one_work+0x1a7 at ffffffff9bf10967 |
| |
| crash> rx ff646f1980153e68 |
| ff646f1980153e68: ff1c6a1a04dc83f0 <<< work = &priv->flush_light |
| |
| crash> ipoib_dev_priv.flush_light.func,broadcast ff1c6a1a04dc8000 |
| flush_light.func = 0xffffffffc0943820 <ipoib_ib_dev_flush_light>, |
| broadcast = 0x0, |
| |
| The mcast(s) on the `remove_list` (the remaining part of the ex `priv->multicast_list`): |
| |
| crash> list -s ipoib_mcast.done.done ipoib_mcast.list -H ff646f1980153e10 | paste - - |
| ff1c6a192bd0c200 done.done = 0x0, |
| ff1c6a192d60ac00 done.done = 0x0, |
| |
| The Linux kernel CVE team has assigned CVE-2023-52587 to this issue. |
| |
| |
| Affected and fixed versions |
| =========================== |
| |
| Fixed in 4.19.307 with commit 4c8922ae8eb8dcc1e4b7d1059d97a8334288d825 |
| Fixed in 5.4.269 with commit 615e3adc2042b7be4ad122a043fc9135e6342c90 |
| Fixed in 5.10.210 with commit ac2630fd3c90ffec34a0bfc4d413668538b0e8f2 |
| Fixed in 5.15.149 with commit ed790bd0903ed3352ebf7f650d910f49b7319b34 |
| Fixed in 6.1.77 with commit 5108a2dc2db5630fb6cd58b8be80a0c134bc310a |
| Fixed in 6.6.16 with commit 342258fb46d66c1b4c7e2c3717ac01e10c03cf18 |
| Fixed in 6.7.4 with commit 7c7bd4d561e9dc6f5b7df9e184974915f6701a89 |
| Fixed in 6.8 with commit 4f973e211b3b1c6d36f7c6a19239d258856749f9 |
| |
| Please see https://www.kernel.org for a full list of currently supported |
| kernel versions by the kernel community. |
| |
| Unaffected versions might change over time as fixes are backported to |
| older supported kernel versions. The official CVE entry at |
| https://cve.org/CVERecord/?id=CVE-2023-52587 |
| will be updated if fixes are backported, please check that for the most |
| up to date information about this issue. |
| |
| |
| Affected files |
| ============== |
| |
| The file(s) affected by this issue are: |
| drivers/infiniband/ulp/ipoib/ipoib_multicast.c |
| |
| |
| Mitigation |
| ========== |
| |
| The Linux kernel CVE team recommends that you update to the latest |
| stable kernel version for this, and many other bugfixes. Individual |
| changes are never tested alone, but rather are part of a larger kernel |
| release. Cherry-picking individual commits is not recommended or |
| supported by the Linux kernel community at all. If however, updating to |
| the latest release is impossible, the individual changes to resolve this |
| issue can be found at these commits: |
| https://git.kernel.org/stable/c/4c8922ae8eb8dcc1e4b7d1059d97a8334288d825 |
| https://git.kernel.org/stable/c/615e3adc2042b7be4ad122a043fc9135e6342c90 |
| https://git.kernel.org/stable/c/ac2630fd3c90ffec34a0bfc4d413668538b0e8f2 |
| https://git.kernel.org/stable/c/ed790bd0903ed3352ebf7f650d910f49b7319b34 |
| https://git.kernel.org/stable/c/5108a2dc2db5630fb6cd58b8be80a0c134bc310a |
| https://git.kernel.org/stable/c/342258fb46d66c1b4c7e2c3717ac01e10c03cf18 |
| https://git.kernel.org/stable/c/7c7bd4d561e9dc6f5b7df9e184974915f6701a89 |
| https://git.kernel.org/stable/c/4f973e211b3b1c6d36f7c6a19239d258856749f9 |