| From 4df41ee5a3e73d98c8a235e1f0cec0d78900903b Mon Sep 17 00:00:00 2001 |
| From: Sasha Levin <sashal@kernel.org> |
| Date: Fri, 24 Jul 2020 16:15:43 -0700 |
| Subject: mlx4: disable device on shutdown |
| |
| From: Jakub Kicinski <kuba@kernel.org> |
| |
| [ Upstream commit 3cab8c65525920f00d8f4997b3e9bb73aecb3a8e ] |
| |
| It appears that not disabling a PCI device on .shutdown may lead to |
| a Hardware Error with particular (perhaps buggy) BIOS versions: |
| |
| mlx4_en: eth0: Close port called |
| mlx4_en 0000:04:00.0: removed PHC |
| reboot: Restarting system |
| {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1 |
| {1}[Hardware Error]: event severity: fatal |
| {1}[Hardware Error]: Error 0, type: fatal |
| {1}[Hardware Error]: section_type: PCIe error |
| {1}[Hardware Error]: port_type: 4, root port |
| {1}[Hardware Error]: version: 1.16 |
| {1}[Hardware Error]: command: 0x4010, status: 0x0143 |
| {1}[Hardware Error]: device_id: 0000:00:02.2 |
| {1}[Hardware Error]: slot: 0 |
| {1}[Hardware Error]: secondary_bus: 0x04 |
| {1}[Hardware Error]: vendor_id: 0x8086, device_id: 0x2f06 |
| {1}[Hardware Error]: class_code: 000604 |
| {1}[Hardware Error]: bridge: secondary_status: 0x2000, control: 0x0003 |
| {1}[Hardware Error]: aer_uncor_status: 0x00100000, aer_uncor_mask: 0x00000000 |
| {1}[Hardware Error]: aer_uncor_severity: 0x00062030 |
| {1}[Hardware Error]: TLP Header: 40000018 040000ff 791f4080 00000000 |
| [hw error repeats] |
| Kernel panic - not syncing: Fatal hardware error! |
| CPU: 0 PID: 2189 Comm: reboot Kdump: loaded Not tainted 5.6.x-blabla #1 |
| Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 05/05/2017 |
| |
| Fix the mlx4 driver. |
| |
| This is a very similar problem to what had been fixed in: |
| commit 0d98ba8d70b0 ("scsi: hpsa: disable device during shutdown") |
| to address https://bugzilla.kernel.org/show_bug.cgi?id=199779. |
| |
| Fixes: 2ba5fbd62b25 ("net/mlx4_core: Handle AER flow properly") |
| Reported-by: Jake Lawrence <lawja@fb.com> |
| Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
| Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> |
| Signed-off-by: David S. Miller <davem@davemloft.net> |
| Signed-off-by: Sasha Levin <sashal@kernel.org> |
| --- |
| drivers/net/ethernet/mellanox/mlx4/main.c | 2 ++ |
| 1 file changed, 2 insertions(+) |
| |
| diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c |
| index f7825c7b92fe3..8d7bb9a889677 100644 |
| --- a/drivers/net/ethernet/mellanox/mlx4/main.c |
| +++ b/drivers/net/ethernet/mellanox/mlx4/main.c |
| @@ -4311,12 +4311,14 @@ end: |
| static void mlx4_shutdown(struct pci_dev *pdev) |
| { |
| struct mlx4_dev_persistent *persist = pci_get_drvdata(pdev); |
| + struct mlx4_dev *dev = persist->dev; |
| |
| mlx4_info(persist->dev, "mlx4_shutdown was called\n"); |
| mutex_lock(&persist->interface_state_mutex); |
| if (persist->interface_state & MLX4_INTERFACE_STATE_UP) |
| mlx4_unload_one(pdev); |
| mutex_unlock(&persist->interface_state_mutex); |
| + mlx4_pci_disable_device(dev); |
| } |
| |
| static const struct pci_error_handlers mlx4_err_handler = { |
| -- |
| 2.25.1 |
| |