| From: Chris Metcalf <cmetcalf@mellanox.com> |
| Date: Wed, 16 Nov 2016 11:18:05 -0500 |
| Subject: tile: avoid using clocksource_cyc2ns with absolute cycle count |
| |
| commit e658a6f14d7c0243205f035979d0ecf6c12a036f upstream. |
| |
| For large values of "mult" and long uptimes, the intermediate |
| result of "cycles * mult" can overflow 64 bits. For example, |
| the tile platform calls clocksource_cyc2ns with a 1.2 GHz clock; |
| we have mult = 853, and after 208.5 days, we overflow 64 bits. |
| |
| Since clocksource_cyc2ns() is intended to be used for relative |
| cycle counts, not absolute cycle counts, performance is more |
| importance than accepting a wider range of cycle values. So, |
| just use mult_frac() directly in tile's sched_clock(). |
| |
| Commit 4cecf6d401a0 ("sched, x86: Avoid unnecessary overflow |
| in sched_clock") by Salman Qazi results in essentially the same |
| generated code for x86 as this change does for tile. In fact, |
| a follow-on change by Salman introduced mult_frac() and switched |
| to using it, so the C code was largely identical at that point too. |
| |
| Peter Zijlstra then added mul_u64_u32_shr() and switched x86 |
| to use it. This is, in principle, better; by optimizing the |
| 64x64->64 multiplies to be 32x32->64 multiplies we can potentially |
| save some time. However, the compiler piplines the 64x64->64 |
| multiplies pretty well, and the conditional branch in the generic |
| mul_u64_u32_shr() causes some bubbles in execution, with the |
| result that it's pretty much a wash. If tilegx provided its own |
| implementation of mul_u64_u32_shr() without the conditional branch, |
| we could potentially save 3 cycles, but that seems like small gain |
| for a fair amount of additional build scaffolding; no other platform |
| currently provides a mul_u64_u32_shr() override, and tile doesn't |
| currently have an <asm/div64.h> header to put the override in. |
| |
| Additionally, gcc currently has an optimization bug that prevents |
| it from recognizing the opportunity to use a 32x32->64 multiply, |
| and so the result would be no better than the existing mult_frac() |
| until such time as the compiler is fixed. |
| |
| For now, just using mult_frac() seems like the right answer. |
| |
| Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com> |
| Signed-off-by: Ben Hutchings <ben@decadent.org.uk> |
| --- |
| arch/tile/kernel/time.c | 4 ++-- |
| 1 file changed, 2 insertions(+), 2 deletions(-) |
| |
| --- a/arch/tile/kernel/time.c |
| +++ b/arch/tile/kernel/time.c |
| @@ -216,8 +216,8 @@ void do_timer_interrupt(struct pt_regs * |
| */ |
| unsigned long long sched_clock(void) |
| { |
| - return clocksource_cyc2ns(get_cycles(), |
| - sched_clock_mult, SCHED_CLOCK_SHIFT); |
| + return mult_frac(get_cycles(), |
| + sched_clock_mult, 1ULL << SCHED_CLOCK_SHIFT); |
| } |
| |
| int setup_profiling_timer(unsigned int multiplier) |