| From ef8273344646320a344fddeb3d89fa1328977f45 Mon Sep 17 00:00:00 2001 |
| From: Heiko Carstens <heiko.carstens@de.ibm.com> |
| Date: Wed, 1 Dec 2010 10:11:09 +0100 |
| Subject: [PATCH] nohz: Fix get_next_timer_interrupt() vs cpu hotplug |
| |
| commit dbd87b5af055a0cc9bba17795c9a2b0d17795389 upstream. |
| |
| This fixes a bug as seen on 2.6.32 based kernels where timers got |
| enqueued on offline cpus. |
| |
| If a cpu goes offline it might still have pending timers. These will |
| be migrated during CPU_DEAD handling after the cpu is offline. |
| However while the cpu is going offline it will schedule the idle task |
| which will then call tick_nohz_stop_sched_tick(). |
| |
| That function in turn will call get_next_timer_intterupt() to figure |
| out if the tick of the cpu can be stopped or not. If it turns out that |
| the next tick is just one jiffy off (delta_jiffies == 1) |
| tick_nohz_stop_sched_tick() incorrectly assumes that the tick should |
| not stop and takes an early exit and thus it won't update the load |
| balancer cpu. |
| |
| Just afterwards the cpu will be killed and the load balancer cpu could |
| be the offline cpu. |
| |
| On 2.6.32 based kernel get_nohz_load_balancer() gets called to decide |
| on which cpu a timer should be enqueued (see __mod_timer()). Which |
| leads to the possibility that timers get enqueued on an offline cpu. |
| These will never expire and can cause a system hang. |
| |
| This has been observed 2.6.32 kernels. On current kernels |
| __mod_timer() uses get_nohz_timer_target() which doesn't have that |
| problem. However there might be other problems because of the too |
| early exit tick_nohz_stop_sched_tick() in case a cpu goes offline. |
| |
| The easiest and probably safest fix seems to be to let |
| get_next_timer_interrupt() just lie and let it say there isn't any |
| pending timer if the current cpu is offline. |
| |
| I also thought of moving migrate_[hr]timers() from CPU_DEAD to |
| CPU_DYING, but seeing that there already have been fixes at least in |
| the hrtimer code in this area I'm afraid that this could add new |
| subtle bugs. |
| |
| Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> |
| Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> |
| LKML-Reference: <20101201091109.GA8984@osiris.boeblingen.de.ibm.com> |
| Signed-off-by: Ingo Molnar <mingo@elte.hu> |
| Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> |
| |
| diff --git a/kernel/timer.c b/kernel/timer.c |
| index aeb6a54..b7e3951 100644 |
| --- a/kernel/timer.c |
| +++ b/kernel/timer.c |
| @@ -1173,6 +1173,12 @@ unsigned long get_next_timer_interrupt(unsigned long now) |
| struct tvec_base *base = __get_cpu_var(tvec_bases); |
| unsigned long expires; |
| |
| + /* |
| + * Pretend that there is no timer pending if the cpu is offline. |
| + * Possible pending timers will be migrated later to an active cpu. |
| + */ |
| + if (cpu_is_offline(smp_processor_id())) |
| + return now + NEXT_TIMER_MAX_DELTA; |
| spin_lock(&base->lock); |
| if (time_before_eq(base->next_timer, base->timer_jiffies)) |
| base->next_timer = __next_timer_interrupt(base); |
| -- |
| 1.7.4.4 |
| |