refs/tags/pci-v5.13-fixes-2 - pub/scm/linux/kernel/git/helgaas/pci

tag	381479545e31bcfa74f934121b37e444ce10393f
tagger	Bjorn Helgaas <bhelgaas@google.com>	Fri Jun 18 15:05:48 2021 -0500
object	f18139966d072dab8e4398c95ce955a9742e04f7

pci-v5.13-fixes-2

commit	f18139966d072dab8e4398c95ce955a9742e04f7	[log] [tgz]
author	Pali Rohár <pali@kernel.org>	Tue Jun 08 22:36:55 2021 +0200
committer	Bjorn Helgaas <bhelgaas@google.com>	Fri Jun 18 10:32:35 2021 -0500
tree	ec5dd4dc1e2fe5c013e68314d8c01b4fb6f49199
parent	cacf994a91d3a55c0c2f853d6429cd7b86113915 [diff]

PCI: aardvark: Fix kernel panic during PIO transfer

Trying to start a new PIO transfer by writing value 0 in PIO_START register
when previous transfer has not yet completed (which is indicated by value 1
in PIO_START) causes an External Abort on CPU, which results in kernel
panic:

    SError Interrupt on CPU0, code 0xbf000002 -- SError
    Kernel panic - not syncing: Asynchronous SError Interrupt

To prevent kernel panic, it is required to reject a new PIO transfer when
previous one has not finished yet.

If previous PIO transfer is not finished yet, the kernel may issue a new
PIO request only if the previous PIO transfer timed out.

In the past the root cause of this issue was incorrectly identified (as it
often happens during link retraining or after link down event) and special
hack was implemented in Trusted Firmware to catch all SError events in EL3,
to ignore errors with code 0xbf000002 and not forwarding any other errors
to kernel and instead throw panic from EL3 Trusted Firmware handler.

Links to discussion and patches about this issue:
https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/commit/?id=3c7dcdac5c50
https://lore.kernel.org/linux-pci/20190316161243.29517-1-repk@triplefau.lt/
https://lore.kernel.org/linux-pci/971be151d24312cc533989a64bd454b4@www.loen.fr/
https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/1541

But the real cause was the fact that during link retraining or after link
down event the PIO transfer may take longer time, up to the 1.44s until it
times out. This increased probability that a new PIO transfer would be
issued by kernel while previous one has not finished yet.

After applying this change into the kernel, it is possible to revert the
mentioned TF-A hack and SError events do not have to be caught in TF-A EL3.

Link: https://lore.kernel.org/r/20210608203655.31228-1-pali@kernel.org
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Marek Behún <kabel@kernel.org>
Cc: stable@vger.kernel.org # 7fbcb5da811b ("PCI: aardvark: Don't rely on jiffies while holding spinlock")

drivers/pci/controller/pci-aardvark.c[diff]

1 file changed

tree: ec5dd4dc1e2fe5c013e68314d8c01b4fb6f49199