| From bbf26ca1820b36392ccafff272ef375c03224af0 Mon Sep 17 00:00:00 2001 |
| From: Sasha Levin <sashal@kernel.org> |
| Date: Tue, 15 Jul 2025 11:15:35 +0000 |
| Subject: scsi: aacraid: Stop using PCI_IRQ_AFFINITY |
| |
| From: John Garry <john.g.garry@oracle.com> |
| |
| [ Upstream commit dafeaf2c03e71255438ffe5a341d94d180e6c88e ] |
| |
| When PCI_IRQ_AFFINITY is set for calling pci_alloc_irq_vectors(), it |
| means interrupts are spread around the available CPUs. It also means that |
| the interrupts become managed, which means that an interrupt is shutdown |
| when all the CPUs in the interrupt affinity mask go offline. |
| |
| Using managed interrupts in this way means that we should ensure that |
| completions should not occur on HW queues where the associated interrupt |
| is shutdown. This is typically achieved by ensuring only CPUs which are |
| online can generate IO completion traffic to the HW queue which they are |
| mapped to (so that they can also serve completion interrupts for that HW |
| queue). |
| |
| The problem in the driver is that a CPU can generate completions to a HW |
| queue whose interrupt may be shutdown, as the CPUs in the HW queue |
| interrupt affinity mask may be offline. This can cause IOs to never |
| complete and hang the system. The driver maintains its own CPU <-> HW |
| queue mapping for submissions, see aac_fib_vector_assign(), but this does |
| not reflect the CPU <-> HW queue interrupt affinity mapping. |
| |
| Commit 9dc704dcc09e ("scsi: aacraid: Reply queue mapping to CPUs based on |
| IRQ affinity") tried to remedy this issue may mapping CPUs properly to HW |
| queue interrupts. However this was later reverted in commit c5becf57dd56 |
| ("Revert "scsi: aacraid: Reply queue mapping to CPUs based on IRQ |
| affinity") - it seems that there were other reports of hangs. I guess |
| that this was due to some implementation issue in the original commit or |
| maybe a HW issue. |
| |
| Fix the very original hang by just not using managed interrupts by not |
| setting PCI_IRQ_AFFINITY. In this way, all CPUs will be in each HW queue |
| affinity mask, so should not create completion problems if any CPUs go |
| offline. |
| |
| Signed-off-by: John Garry <john.g.garry@oracle.com> |
| Link: https://lore.kernel.org/r/20250715111535.499853-1-john.g.garry@oracle.com |
| Closes: https://lore.kernel.org/linux-scsi/20250618192427.3845724-1-jmeneghi@redhat.com/ |
| Reviewed-by: John Meneghini <jmeneghi@redhat.com> |
| Tested-by: John Meneghini <jmeneghi@redhat.com> |
| Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> |
| Signed-off-by: Sasha Levin <sashal@kernel.org> |
| --- |
| drivers/scsi/aacraid/comminit.c | 3 +-- |
| 1 file changed, 1 insertion(+), 2 deletions(-) |
| |
| diff --git a/drivers/scsi/aacraid/comminit.c b/drivers/scsi/aacraid/comminit.c |
| index 0f64b0244303..31b95e6c96c5 100644 |
| --- a/drivers/scsi/aacraid/comminit.c |
| +++ b/drivers/scsi/aacraid/comminit.c |
| @@ -481,8 +481,7 @@ void aac_define_int_mode(struct aac_dev *dev) |
| pci_find_capability(dev->pdev, PCI_CAP_ID_MSIX)) { |
| min_msix = 2; |
| i = pci_alloc_irq_vectors(dev->pdev, |
| - min_msix, msi_count, |
| - PCI_IRQ_MSIX | PCI_IRQ_AFFINITY); |
| + min_msix, msi_count, PCI_IRQ_MSIX); |
| if (i > 0) { |
| dev->msi_enabled = 1; |
| msi_count = i; |
| -- |
| 2.39.5 |
| |