| From 7abb690a0e095717420ba78dcab4309abbbec78a Mon Sep 17 00:00:00 2001 |
| From: Daniel Vetter <daniel.vetter@ffwll.ch> |
| Date: Fri, 24 May 2013 21:29:32 +0200 |
| Subject: drm/i915: Fix spurious -EIO/SIGBUS on wedged gpus |
| |
| From: Daniel Vetter <daniel.vetter@ffwll.ch> |
| |
| commit 7abb690a0e095717420ba78dcab4309abbbec78a upstream. |
| |
| Chris Wilson noticed that since |
| |
| commit 1f83fee08d625f8d0130f9fe5ef7b17c2e022f3c [v3.9] |
| Author: Daniel Vetter <daniel.vetter@ffwll.ch> |
| Date: Thu Nov 15 17:17:22 2012 +0100 |
| |
| drm/i915: clear up wedged transitions |
| |
| X can again get -EIO when it does not expect it. And even worse score |
| a SIGBUS when accessing gtt mmaps. The established ABI is that we |
| _only_ return an -EIO from execbuf - all other ioctls should just |
| work. And since the reset code moves all bos out of gpu domains and |
| clears out all the last_seqno/ring tracking there really shouldn't be |
| any reason for non-execbuf code to ever touch the hw and see an -EIO. |
| |
| After some extensive discussions we've noticed that these spurios -EIO |
| are caused by i915_gem_wait_for_error: |
| |
| http://www.mail-archive.com/intel-gfx@lists.freedesktop.org/msg20540.html |
| |
| That is easy to fix by returning 0 instead of -EIO, since grabbing the |
| dev->struct_mutex does not yet mean that we actually want to touch the |
| hw. And so there is no reason at all to fail with -EIO. |
| |
| But that's not the entire since, since often (at least it's easily |
| googleable) dmesg indicates that the reset fails and we declare the |
| gpu wedged. Then, quite a bit later X wakes up with the "Timed out |
| waiting for the gpu reset to complete" DRM_ERROR message in |
| wait_for_errror and brings down the desktop with an -EIO/SIGBUS. |
| |
| So clearly we're missing a wakeup somewhere, since the gpu reset just |
| doesn't take 10 seconds to complete. And indeed we're do handle the |
| terminally wedged state wrong. |
| |
| Fix this all up. |
| |
| Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> |
| Cc: Daniel Vetter <daniel.vetter@ffwll.ch> |
| Cc: Damien Lespiau <damien.lespiau@intel.com> |
| Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| |
| --- |
| drivers/gpu/drm/i915/i915_gem.c | 7 ++----- |
| 1 file changed, 2 insertions(+), 5 deletions(-) |
| |
| --- a/drivers/gpu/drm/i915/i915_gem.c |
| +++ b/drivers/gpu/drm/i915/i915_gem.c |
| @@ -91,14 +91,11 @@ i915_gem_wait_for_error(struct i915_gpu_ |
| { |
| int ret; |
| |
| -#define EXIT_COND (!i915_reset_in_progress(error)) |
| +#define EXIT_COND (!i915_reset_in_progress(error) || \ |
| + i915_terminally_wedged(error)) |
| if (EXIT_COND) |
| return 0; |
| |
| - /* GPU is already declared terminally dead, give up. */ |
| - if (i915_terminally_wedged(error)) |
| - return -EIO; |
| - |
| /* |
| * Only wait 10 seconds for the gpu reset to complete to avoid hanging |
| * userspace. If it takes that long something really bad is going on and |