| From 3a3326a902700916268d671eb8ac154de059c91a Mon Sep 17 00:00:00 2001 |
| From: David S. Miller <davem@davemloft.net> |
| Date: Thu, 3 Sep 2009 02:35:20 -0700 |
| Subject: sparc64: Kill spurious NMI watchdog triggers by increasing limit to 30 seconds. |
| |
| From: David S. Miller <davem@davemloft.net> |
| |
| [ Upstream commit e6617c6ec28a17cf2f90262b835ec05b9b861400 ] |
| |
| This is a compromise and a temporary workaround for bootup NMI |
| watchdog triggers some people see with qla2xxx devices present. |
| |
| This happens when, for example: |
| |
| CPU 0 is in the driver init and looping submitting mailbox commands to |
| load the firmware, then waiting for completion. |
| |
| CPU 1 is receiving the device interrupts. CPU 1 is where the NMI |
| watchdog triggers. |
| |
| CPU 0 is submitting mailbox commands fast enough that by the time CPU |
| 1 returns from the device interrupt handler, a new one is pending. |
| This sequence runs for more than 5 seconds. |
| |
| The problematic case is CPU 1's timer interrupt running when the |
| barrage of device interrupts begin. Then we have: |
| |
| timer interrupt |
| return for softirq checking |
| pending, thus enable interrupts |
| |
| qla2xxx interrupt |
| return |
| qla2xxx interrupt |
| return |
| ... 5+ seconds pass |
| final qla2xxx interrupt for fw load |
| return |
| |
| run timer softirq |
| return |
| |
| At some point in the multi-second qla2xxx interrupt storm we trigger |
| the NMI watchdog on CPU 1 from the NMI interrupt handler. |
| |
| The timer softirq, once we get back to running it, is smart enough to |
| run the timer work enough times to make up for the missed timer |
| interrupts. |
| |
| However, the NMI watchdogs (both x86 and sparc) use the timer |
| interrupt count to notice the cpu is wedged. But in the above |
| scenerio we'll receive only one such timer interrupt even if we last |
| all the way back to running the timer softirq. |
| |
| The default watchdog trigger point is only 5 seconds, which is pretty |
| low (the softwatchdog triggers at 60 seconds). So increase it to 30 |
| seconds for now. |
| |
| Signed-off-by: David S. Miller <davem@davemloft.net> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> |
| --- |
| arch/sparc/kernel/nmi.c | 2 +- |
| 1 file changed, 1 insertion(+), 1 deletion(-) |
| |
| --- a/arch/sparc/kernel/nmi.c |
| +++ b/arch/sparc/kernel/nmi.c |
| @@ -103,7 +103,7 @@ notrace __kprobes void perfctr_irq(int i |
| } |
| if (!touched && __get_cpu_var(last_irq_sum) == sum) { |
| local_inc(&__get_cpu_var(alert_counter)); |
| - if (local_read(&__get_cpu_var(alert_counter)) == 5 * nmi_hz) |
| + if (local_read(&__get_cpu_var(alert_counter)) == 30 * nmi_hz) |
| die_nmi("BUG: NMI Watchdog detected LOCKUP", |
| regs, panic_on_timeout); |
| } else { |