| From f1ed3df20d2d223e0852cc4ac1f19bba869a7e3c Mon Sep 17 00:00:00 2001 |
| From: Michal Wnukowski <wnukowski@google.com> |
| Date: Wed, 15 Aug 2018 15:51:57 -0700 |
| Subject: nvme-pci: add a memory barrier to nvme_dbbuf_update_and_check_event |
| |
| From: Michal Wnukowski <wnukowski@google.com> |
| |
| commit f1ed3df20d2d223e0852cc4ac1f19bba869a7e3c upstream. |
| |
| In many architectures loads may be reordered with older stores to |
| different locations. In the nvme driver the following two operations |
| could be reordered: |
| |
| - Write shadow doorbell (dbbuf_db) into memory. |
| - Read EventIdx (dbbuf_ei) from memory. |
| |
| This can result in a potential race condition between driver and VM host |
| processing requests (if given virtual NVMe controller has a support for |
| shadow doorbell). If that occurs, then the NVMe controller may decide to |
| wait for MMIO doorbell from guest operating system, and guest driver may |
| decide not to issue MMIO doorbell on any of subsequent commands. |
| |
| This issue is purely timing-dependent one, so there is no easy way to |
| reproduce it. Currently the easiest known approach is to run "Oracle IO |
| Numbers" (orion) that is shipped with Oracle DB: |
| |
| orion -run advanced -num_large 0 -size_small 8 -type rand -simulate \ |
| concat -write 40 -duration 120 -matrix row -testname nvme_test |
| |
| Where nvme_test is a .lun file that contains a list of NVMe block |
| devices to run test against. Limiting number of vCPUs assigned to given |
| VM instance seems to increase chances for this bug to occur. On test |
| environment with VM that got 4 NVMe drives and 1 vCPU assigned the |
| virtual NVMe controller hang could be observed within 10-20 minutes. |
| That correspond to about 400-500k IO operations processed (or about |
| 100GB of IO read/writes). |
| |
| Orion tool was used as a validation and set to run in a loop for 36 |
| hours (equivalent of pushing 550M IO operations). No issues were |
| observed. That suggest that the patch fixes the issue. |
| |
| Fixes: f9f38e33389c ("nvme: improve performance for virtual NVMe devices") |
| Signed-off-by: Michal Wnukowski <wnukowski@google.com> |
| Reviewed-by: Keith Busch <keith.busch@intel.com> |
| Reviewed-by: Sagi Grimberg <sagi@grimberg.me> |
| [hch: updated changelog and comment a bit] |
| Signed-off-by: Christoph Hellwig <hch@lst.de> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| |
| --- |
| drivers/nvme/host/pci.c | 8 ++++++++ |
| 1 file changed, 8 insertions(+) |
| |
| --- a/drivers/nvme/host/pci.c |
| +++ b/drivers/nvme/host/pci.c |
| @@ -306,6 +306,14 @@ static bool nvme_dbbuf_update_and_check_ |
| old_value = *dbbuf_db; |
| *dbbuf_db = value; |
| |
| + /* |
| + * Ensure that the doorbell is updated before reading the event |
| + * index from memory. The controller needs to provide similar |
| + * ordering to ensure the envent index is updated before reading |
| + * the doorbell. |
| + */ |
| + mb(); |
| + |
| if (!nvme_dbbuf_need_event(*dbbuf_ei, value, old_value)) |
| return false; |
| } |