sched: Make schedstats a runtime tunable that is disabled by default

schedstats is very useful during debugging and performance tuning but it
incurs overhead. As such, even though it can be disabled at build time,
it is often enabled as the information is useful.  This patch adds a
kernel command-line and sysctl tunable to enable or disable schedstats on
demand. It is disabled by default as someone who knows they need it can
also learn to enable it when necessary.

The benefits are workload-dependent but when it gets down to it, the
difference will be whether cache misses are incurred updating the shared
stats or not. These measurements were taken from a 48-core 2-socket machine
with Xeon(R) E5-2670 v3 cpus although they were also tested on a single
socket machine 8-core machine with Intel i7-3770 processors.

netperf TCP_STREAM
                           4.5.0-rc1             4.5.0-rc1
                             vanilla          nostats-v2r2
Hmean    64         560.45 (  0.00%)      576.96 (  2.94%)
Hmean    128        766.66 (  0.00%)      797.54 (  4.03%)
Hmean    256        950.51 (  0.00%)      972.24 (  2.29%)
Hmean    1024      1433.25 (  0.00%)     1492.66 (  4.15%)
Hmean    2048      2810.54 (  0.00%)     2984.70 (  6.20%)
Hmean    3312      4618.18 (  0.00%)     4778.72 (  3.48%)
Hmean    4096      5306.42 (  0.00%)     5389.35 (  1.56%)
Hmean    8192     10581.44 (  0.00%)    10824.27 (  2.29%)
Hmean    16384    18857.70 (  0.00%)    18911.32 (  0.28%)

Small gains here, UDP_STREAM showed nothing intresting and neither did
the TCP_RR tests. The gains on the 8-core machine were very similar.

tbench4
                                 4.5.0-rc1             4.5.0-rc1
                                   vanilla          nostats-v2r2
Hmean    mb/sec-1         500.85 (  0.00%)      522.43 (  4.31%)
Hmean    mb/sec-2         984.66 (  0.00%)     1017.92 (  3.38%)
Hmean    mb/sec-4        1827.91 (  0.00%)     1871.38 (  2.38%)
Hmean    mb/sec-8        3561.36 (  0.00%)     3563.62 (  0.06%)
Hmean    mb/sec-16       5824.52 (  0.00%)     5918.90 (  1.62%)
Hmean    mb/sec-32      10943.10 (  0.00%)    10967.55 (  0.22%)
Hmean    mb/sec-64      15950.81 (  0.00%)    15976.55 (  0.16%)
Hmean    mb/sec-128     15302.17 (  0.00%)    15372.01 (  0.46%)
Hmean    mb/sec-256     14866.18 (  0.00%)    14938.50 (  0.49%)
Hmean    mb/sec-512     15223.31 (  0.00%)    15360.33 (  0.90%)
Hmean    mb/sec-1024    14574.25 (  0.00%)    14632.68 (  0.40%)
Hmean    mb/sec-2048    13569.02 (  0.00%)    13861.61 (  2.16%)
Hmean    mb/sec-3072    12865.98 (  0.00%)    13106.66 (  1.87%)

Small gains of 2-4% at low thread counts and otherwise flat.  The
gains on the 8-core machine were slightly different

tbench4 on 8-core i7-3770 single socket machine
Hmean    mb/sec-1        442.59 (  0.00%)      448.73 (  1.39%)
Hmean    mb/sec-2        796.68 (  0.00%)      794.39 ( -0.29%)
Hmean    mb/sec-4       1322.52 (  0.00%)     1343.66 (  1.60%)
Hmean    mb/sec-8       2611.65 (  0.00%)     2694.86 (  3.19%)
Hmean    mb/sec-16      2537.07 (  0.00%)     2609.34 (  2.85%)
Hmean    mb/sec-32      2506.02 (  0.00%)     2578.18 (  2.88%)
Hmean    mb/sec-64      2511.06 (  0.00%)     2569.16 (  2.31%)
Hmean    mb/sec-128     2313.38 (  0.00%)     2395.50 (  3.55%)
Hmean    mb/sec-256     2110.04 (  0.00%)     2177.45 (  3.19%)
Hmean    mb/sec-512     2072.51 (  0.00%)     2053.97 ( -0.89%)

In constract, this shows a relatively steady 2-3% gain at higher thread
counts. Due to the nature of the patch and the type of workload, it's
not a surprise that the result will depend on the CPU used.

hackbench-pipes
                         4.5.0-rc1             4.5.0-rc1
                           vanilla          nostats-v2r2
Amean    1        0.0637 (  0.00%)      0.0666 ( -4.48%)
Amean    4        0.1229 (  0.00%)      0.1240 ( -0.93%)
Amean    7        0.1921 (  0.00%)      0.1967 ( -2.38%)
Amean    12       0.3117 (  0.00%)      0.3133 ( -0.50%)
Amean    21       0.4050 (  0.00%)      0.3954 (  2.36%)
Amean    30       0.4586 (  0.00%)      0.4529 (  1.25%)
Amean    48       0.5910 (  0.00%)      0.5733 (  3.00%)
Amean    79       0.8663 (  0.00%)      0.8394 (  3.10%)
Amean    110      1.1543 (  0.00%)      1.1449 (  0.82%)
Amean    141      1.4457 (  0.00%)      1.4526 ( -0.47%)
Amean    172      1.7090 (  0.00%)      1.7121 ( -0.18%)
Amean    192      1.9126 (  0.00%)      1.8959 (  0.87%)

This is borderline at best, small gains and losses and while the variance
data is not included, it's will within the noise. The UMA machine did not
show anything particularly different

pipetest
                             4.5.0-rc1             4.5.0-rc1
                               vanilla          nostats-v2r2
Min         Time        4.13 (  0.00%)        3.99 (  3.39%)
1st-qrtle   Time        4.38 (  0.00%)        4.27 (  2.51%)
2nd-qrtle   Time        4.46 (  0.00%)        4.39 (  1.57%)
3rd-qrtle   Time        4.56 (  0.00%)        4.51 (  1.10%)
Max-90%     Time        4.67 (  0.00%)        4.60 (  1.50%)
Max-93%     Time        4.71 (  0.00%)        4.65 (  1.27%)
Max-95%     Time        4.74 (  0.00%)        4.71 (  0.63%)
Max-99%     Time        4.88 (  0.00%)        4.79 (  1.84%)
Max         Time        4.93 (  0.00%)        4.83 (  2.03%)
Mean        Time        4.48 (  0.00%)        4.39 (  1.91%)
Best99%Mean Time        4.47 (  0.00%)        4.39 (  1.91%)
Best95%Mean Time        4.46 (  0.00%)        4.38 (  1.93%)
Best90%Mean Time        4.45 (  0.00%)        4.36 (  1.98%)
Best50%Mean Time        4.36 (  0.00%)        4.25 (  2.49%)
Best10%Mean Time        4.23 (  0.00%)        4.10 (  3.13%)
Best5%Mean  Time        4.19 (  0.00%)        4.06 (  3.20%)
Best1%Mean  Time        4.13 (  0.00%)        4.00 (  3.39%)

Small improvement and similar gains were seen on the UMA machine.

The gain is small but it'll depend on the CPU and the workload whether
this patch makes a different.  However, it stands to reason that doing
less work in the scheduler is a good thing. The downside is that the
lack of schedstats and tracepoints will be surprising to experts doing
performance analysis until they find the existance of the schedstats=
parameter or schedstats sysctl.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
9 files changed