refs/tags/misc-habanalabs-fixes-2021-01-13 - pub/scm/linux/kernel/git/ogabbay/linux

tag	29843a75c65586d3cecd349afaa8c7f2b4c83a1e
tagger	Oded Gabbay <ogabbay@kernel.org>	Wed Jan 13 09:45:12 2021 +0200
object	9488307a5559255f2fc9a3ab61e1c31e243ca7c6

This tag contains the following bug fixes:

- Fix the dma address that is passed to dma_mmap_coherent. We passed
  an address that includes an offset that is needed by our device and
  that caused dma_mmap_coherent to do an errounous mapping.

- Fix the reset process in case failures happen during the reset process.
  Without this fix, if the user would have asked to perform reset after
  the previous reset failed he would get a kernel panic

- WA to prevent soft lockup BUG during unmap of host memory. In case of
  tens of thousands of mappings, the unmapping can take a long time that
  exceeds the soft lockup timeout. This WA adds a small sleep every 32K
  page unmappings to prevent that.

commit	9488307a5559255f2fc9a3ab61e1c31e243ca7c6	[log] [tgz]
author	Oded Gabbay <ogabbay@kernel.org>	Mon Jan 11 17:49:30 2021 +0200
committer	Oded Gabbay <ogabbay@kernel.org>	Tue Jan 12 15:00:10 2021 +0200
tree	ad3d6a9d5a2175e2346c0766a0388cb6efa432b1
parent	aa6df6533b8f9ead98889baa92e2b19793b1c77e [diff]

habanalabs: prevent soft lockup during unmap

When using Deep learning framework such as tensorflow or pytorch, there
are tens of thousands of host memory mappings. When the user frees
all those mappings at the same time, the process of unmapping and
unpinning them can take a long time, which may cause a soft lockup
bug.

To prevent this, we need to free the core to do other things during
the unmapping process. For now, we chose to do it every 32K unmappings
(each unmap is a single 4K page).

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>

3 files changed

tree: ad3d6a9d5a2175e2346c0766a0388cb6efa432b1