| From 9c1645727b8fa90d07256fdfcc45bf831242a3ab Mon Sep 17 00:00:00 2001 |
| From: Thomas Gleixner <tglx@linutronix.de> |
| Date: Thu, 8 Dec 2016 20:49:32 +0000 |
| Subject: timekeeping_Force_unsigned_clocksource_to_nanoseconds_conversion |
| |
| From: Thomas Gleixner <tglx@linutronix.de> |
| |
| commit 9c1645727b8fa90d07256fdfcc45bf831242a3ab upstream. |
| |
| The clocksource delta to nanoseconds conversion is using signed math, but |
| the delta is unsigned. This makes the conversion space smaller than |
| necessary and in case of a multiplication overflow the conversion can |
| become negative. The conversion is done with scaled math: |
| |
| s64 nsec_delta = ((s64)clkdelta * clk->mult) >> clk->shift; |
| |
| Shifting a signed integer right obvioulsy preserves the sign, which has |
| interesting consequences: |
| |
| - Time jumps backwards |
| |
| - __iter_div_u64_rem() which is used in one of the calling code pathes |
| will take forever to piecewise calculate the seconds/nanoseconds part. |
| |
| This has been reported by several people with different scenarios: |
| |
| David observed that when stopping a VM with a debugger: |
| |
| "It was essentially the stopped by debugger case. I forget exactly why, |
| but the guest was being explicitly stopped from outside, it wasn't just |
| scheduling lag. I think it was something in the vicinity of 10 minutes |
| stopped." |
| |
| When lifting the stop the machine went dead. |
| |
| The stopped by debugger case is not really interesting, but nevertheless it |
| would be a good thing not to die completely. |
| |
| But this was also observed on a live system by Liav: |
| |
| "When the OS is too overloaded, delta will get a high enough value for the |
| msb of the sum delta * tkr->mult + tkr->xtime_nsec to be set, and so |
| after the shift the nsec variable will gain a value similar to |
| 0xffffffffff000000." |
| |
| Unfortunately this has been reintroduced recently with commit 6bd58f09e1d8 |
| ("time: Add cycles to nanoseconds translation"). It had been fixed a year |
| ago already in commit 35a4933a8959 ("time: Avoid signed overflow in |
| timekeeping_get_ns()"). |
| |
| Though it's not surprising that the issue has been reintroduced because the |
| function itself and the whole call chain uses s64 for the result and the |
| propagation of it. The change in this recent commit is subtle: |
| |
| s64 nsec; |
| |
| - nsec = (d * m + n) >> s: |
| + nsec = d * m + n; |
| + nsec >>= s; |
| |
| d being type of cycle_t adds another level of obfuscation. |
| |
| This wouldn't have happened if the previous change to unsigned computation |
| would have made the 'nsec' variable u64 right away and a follow up patch |
| had cleaned up the whole call chain. |
| |
| There have been patches submitted which basically did a revert of the above |
| patch leaving everything else unchanged as signed. Back to square one. This |
| spawned a admittedly pointless discussion about potential users which rely |
| on the unsigned behaviour until someone pointed out that it had been fixed |
| before. The changelogs of said patches added further confusion as they made |
| finally false claims about the consequences for eventual users which expect |
| signed results. |
| |
| Despite delta being cycle_t, aka. u64, it's very well possible to hand in |
| a signed negative value and the signed computation will happily return the |
| correct result. But nobody actually sat down and analyzed the code which |
| was added as user after the propably unintended signed conversion. |
| |
| Though in sensitive code like this it's better to analyze it proper and |
| make sure that nothing relies on this than hunting the subtle wreckage half |
| a year later. After analyzing all call chains it stands that no caller can |
| hand in a negative value (which actually would work due to the s64 cast) |
| and rely on the signed math to do the right thing. |
| |
| Change the conversion function to unsigned math. The conversion of all call |
| chains is done in a follow up patch. |
| |
| This solves the starvation issue, which was caused by the negative result, |
| but it does not solve the underlying problem. It merily procrastinates |
| it. When the timekeeper update is deferred long enough that the unsigned |
| multiplication overflows, then time going backwards is observable again. |
| |
| It does neither solve the issue of clocksources with a small counter width |
| which will wrap around possibly several times and cause random time stamps |
| to be generated. But those are usually not found on systems used for |
| virtualization, so this is likely a non issue. |
| |
| I took the liberty to claim authorship for this simply because |
| analyzing all callsites and writing the changelog took substantially |
| more time than just making the simple s/s64/u64/ change and ignore the |
| rest. |
| |
| Fixes: 6bd58f09e1d8 ("time: Add cycles to nanoseconds translation") |
| Reported-by: David Gibson <david@gibson.dropbear.id.au> |
| Reported-by: Liav Rehana <liavr@mellanox.com> |
| Signed-off-by: Thomas Gleixner <tglx@linutronix.de> |
| Reviewed-by: David Gibson <david@gibson.dropbear.id.au> |
| Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> |
| Cc: Parit Bhargava <prarit@redhat.com> |
| Cc: Laurent Vivier <lvivier@redhat.com> |
| Cc: "Christopher S. Hall" <christopher.s.hall@intel.com> |
| Cc: Chris Metcalf <cmetcalf@mellanox.com> |
| Cc: Richard Cochran <richardcochran@gmail.com> |
| Cc: John Stultz <john.stultz@linaro.org> |
| Link: http://lkml.kernel.org/r/20161208204228.688545601@linutronix.de |
| Signed-off-by: Thomas Gleixner <tglx@linutronix.de> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| |
| --- |
| kernel/time/timekeeping.c | 4 ++-- |
| 1 file changed, 2 insertions(+), 2 deletions(-) |
| |
| --- a/kernel/time/timekeeping.c |
| +++ b/kernel/time/timekeeping.c |
| @@ -299,10 +299,10 @@ u32 (*arch_gettimeoffset)(void) = defaul |
| static inline u32 arch_gettimeoffset(void) { return 0; } |
| #endif |
| |
| -static inline s64 timekeeping_delta_to_ns(struct tk_read_base *tkr, |
| +static inline u64 timekeeping_delta_to_ns(struct tk_read_base *tkr, |
| cycle_t delta) |
| { |
| - s64 nsec; |
| + u64 nsec; |
| |
| nsec = delta * tkr->mult + tkr->xtime_nsec; |
| nsec >>= tkr->shift; |