| From d3e26d2cc1c6ea713a18455673f1cb17dcda20f8 Mon Sep 17 00:00:00 2001 |
| From: Sasha Levin <sashal@kernel.org> |
| Date: Tue, 12 Jan 2021 09:06:38 -0600 |
| Subject: scsi: ibmvfc: Set default timeout to avoid crash during migration |
| |
| From: Brian King <brking@linux.vnet.ibm.com> |
| |
| [ Upstream commit 764907293edc1af7ac857389af9dc858944f53dc ] |
| |
| While testing live partition mobility, we have observed occasional crashes |
| of the Linux partition. What we've seen is that during the live migration, |
| for specific configurations with large amounts of memory, slow network |
| links, and workloads that are changing memory a lot, the partition can end |
| up being suspended for 30 seconds or longer. This resulted in the following |
| scenario: |
| |
| CPU 0 CPU 1 |
| ------------------------------- ---------------------------------- |
| scsi_queue_rq migration_store |
| -> blk_mq_start_request -> rtas_ibm_suspend_me |
| -> blk_add_timer -> on_each_cpu(rtas_percpu_suspend_me |
| _______________________________________V |
| | |
| V |
| -> IPI from CPU 1 |
| -> rtas_percpu_suspend_me |
| -> __rtas_suspend_last_cpu |
| |
| -- Linux partition suspended for > 30 seconds -- |
| -> for_each_online_cpu(cpu) |
| plpar_hcall_norets(H_PROD |
| -> scsi_dispatch_cmd |
| -> scsi_times_out |
| -> scsi_abort_command |
| -> queue_delayed_work |
| -> ibmvfc_queuecommand_lck |
| -> ibmvfc_send_event |
| -> ibmvfc_send_crq |
| - returns H_CLOSED |
| <- returns SCSI_MLQUEUE_HOST_BUSY |
| -> __blk_mq_requeue_request |
| |
| -> scmd_eh_abort_handler |
| -> scsi_try_to_abort_cmd |
| - returns SUCCESS |
| -> scsi_queue_insert |
| |
| Normally, the SCMD_STATE_COMPLETE bit would protect against the command |
| completion and the timeout, but that doesn't work here, since we don't |
| check that at all in the SCSI_MLQUEUE_HOST_BUSY path. |
| |
| In this case we end up calling scsi_queue_insert on a request that has |
| already been queued, or possibly even freed, and we crash. |
| |
| The patch below simply increases the default I/O timeout to avoid this race |
| condition. This is also the timeout value that nearly all IBM SAN storage |
| recommends setting as the default value. |
| |
| Link: https://lore.kernel.org/r/1610463998-19791-1-git-send-email-brking@linux.vnet.ibm.com |
| Signed-off-by: Brian King <brking@linux.vnet.ibm.com> |
| Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> |
| Signed-off-by: Sasha Levin <sashal@kernel.org> |
| --- |
| drivers/scsi/ibmvscsi/ibmvfc.c | 4 +++- |
| 1 file changed, 3 insertions(+), 1 deletion(-) |
| |
| diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c |
| index 04b3ac17531db..7865feb8e5e83 100644 |
| --- a/drivers/scsi/ibmvscsi/ibmvfc.c |
| +++ b/drivers/scsi/ibmvscsi/ibmvfc.c |
| @@ -2891,8 +2891,10 @@ static int ibmvfc_slave_configure(struct scsi_device *sdev) |
| unsigned long flags = 0; |
| |
| spin_lock_irqsave(shost->host_lock, flags); |
| - if (sdev->type == TYPE_DISK) |
| + if (sdev->type == TYPE_DISK) { |
| sdev->allow_restart = 1; |
| + blk_queue_rq_timeout(sdev->request_queue, 120 * HZ); |
| + } |
| spin_unlock_irqrestore(shost->host_lock, flags); |
| return 0; |
| } |
| -- |
| 2.27.0 |
| |