)]}'
{
  "commit": "d59f4fd1d303987f434bcf0b8191e89ca1d6a67c",
  "tree": "043ad8d486f641bb73a562592a785fd787fc4694",
  "parents": [
    "5b1d5e6db20a6c64ffb95d04578db8c4b0228eea"
  ],
  "author": {
    "name": "Chen Yu",
    "email": "yu.c.chen@intel.com",
    "time": "Wed Apr 01 14:52:30 2026 -0700"
  },
  "committer": {
    "name": "Peter Zijlstra",
    "email": "peterz@infradead.org",
    "time": "Thu Apr 09 15:49:51 2026 +0200"
  },
  "message": "sched/cache: Enable cache aware scheduling for multi LLCs NUMA node\n\nIntroduce sched_cache_present to enable cache aware scheduling for\nmulti LLCs NUMA node Cache-aware load balancing should only be\nenabled if there are more than 1 LLCs within 1 NUMA node.\nsched_cache_present is introduced to indicate whether this\nplatform supports this topology.\n\nTest results:\nThe first test platform is a 2 socket Intel Sapphire Rapids with 30\ncores per socket. The DRAM interleaving is enabled in the BIOS so it\nessential has one NUMA node with two last level caches. There are 60\nCPUs associated with each last level cache.\n\nThe second test platform is a AMD Genoa. There are 4 Nodes and 32 CPUs\nper node. Each node has 2 CCXs and each CCX has 16 CPUs.\n\nhackbench/schbench/netperf/stream/stress-ng/chacha20 were launched\non these two platforms.\n\n[TL;DR]\nSappire Rapids:\nhackbench shows significant improvement when the number of\ndifferent active threads is below the capacity of a LLC.\nschbench shows limitted wakeup latency improvement.\nChaCha20-xiangshan(risc-v simulator) shows good throughput\nimprovement. No obvious difference was observed in\nnetperf/stream/stress-ng in Hmean.\n\nGenoa:\nSignificant improvement is observed in hackbench when\nthe active number of threads is lower than the number\nof CPUs within 1 LLC. On v2, Aaron reported improvement\nof hackbench/redis when system is underloaded.\nChaCha20-xiangshan shows huge throughput improvement.\nPhoronix has tested v1 and shows good improvements in 30+\ncases[3]. No obvious difference was observed in\nnetperf/stream/stress-ng in Hmean.\n\nDetail:\nDue to length constraints, data without much difference with\nbaseline is not presented.\n\nSapphire Rapids:\n[hackbench pipe]\n\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\ncase                    load            baseline(std%)  compare%( std%)\nthreads-pipe-10         1-groups         1.00 (  1.22)  +26.09 (  1.10)\nthreads-pipe-10         2-groups         1.00 (  4.90)  +22.88 (  0.18)\nthreads-pipe-10         4-groups         1.00 (  2.07)   +9.00 (  3.49)\nthreads-pipe-10         8-groups         1.00 (  8.13)   +3.45 (  3.62)\nthreads-pipe-16         1-groups         1.00 (  2.11)  +26.30 (  0.08)\nthreads-pipe-16         2-groups         1.00 ( 15.13)   -1.77 ( 11.89)\nthreads-pipe-16         4-groups         1.00 (  4.37)   +0.58 (  7.99)\nthreads-pipe-16         8-groups         1.00 (  2.88)   +2.71 (  3.50)\nthreads-pipe-2          1-groups         1.00 (  9.40)  +22.07 (  0.71)\nthreads-pipe-2          2-groups         1.00 (  9.99)  +18.01 (  0.95)\nthreads-pipe-2          4-groups         1.00 (  3.98)  +24.66 (  0.96)\nthreads-pipe-2          8-groups         1.00 (  7.00)  +21.83 (  0.23)\nthreads-pipe-20         1-groups         1.00 (  1.03)  +28.84 (  0.21)\nthreads-pipe-20         2-groups         1.00 (  4.42)  +31.90 (  3.15)\nthreads-pipe-20         4-groups         1.00 (  9.97)   +4.56 (  1.69)\nthreads-pipe-20         8-groups         1.00 (  1.87)   +1.25 (  0.74)\nthreads-pipe-4          1-groups         1.00 (  4.48)  +25.67 (  0.78)\nthreads-pipe-4          2-groups         1.00 (  9.14)   +4.91 (  2.08)\nthreads-pipe-4          4-groups         1.00 (  7.68)  +19.36 (  1.53)\nthreads-pipe-4          8-groups         1.00 ( 10.79)   +7.20 ( 12.20)\nthreads-pipe-8          1-groups         1.00 (  4.69)  +21.93 (  0.03)\nthreads-pipe-8          2-groups         1.00 (  1.16)  +25.29 (  0.65)\nthreads-pipe-8          4-groups         1.00 (  2.23)   -1.27 (  3.62)\nthreads-pipe-8          8-groups         1.00 (  4.65)   -3.08 (  2.75)\n\nNote: The default number of fd in hackbench is changed from 20 to various\nvalues to ensure that threads fit within a single LLC, especially on AMD\nsystems. Take \"threads-pipe-8, 2-groups\" for example, the number of fd\nis 8, and 2 groups are created.\n\n[schbench]\nThe 99th percentile wakeup latency shows some improvements when the\nsystem is underload, while it does not bring much difference with\nthe increasing of system utilization.\n\n99th Wakeup Latencies\tBase (mean std)      Compare (mean std)   Change\n\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\nthread\u003d2                 9.00(0.00)           9.00(1.73)           0.00%\nthread\u003d4                 7.33(0.58)           6.33(0.58)           +13.64%\nthread\u003d8                 9.00(0.00)           7.67(1.15)           +14.78%\nthread\u003d16                8.67(0.58)           8.67(1.53)           0.00%\nthread\u003d32                9.00(0.00)           7.00(0.00)           +22.22%\nthread\u003d64                9.33(0.58)           9.67(0.58)           -3.64%\nthread\u003d128              12.00(0.00)          12.00(0.00)           0.00%\n\n[chacha20 on simulated risc-v]\nbaseline:\nHost time spent: 67861ms\ncache aware scheduling enabled:\nHost time spent: 54441ms\n\nTime reduced by 24%\n\nGenoa:\n[hackbench pipe]\nThe default number of fd is 20, which exceed the number of CPUs\nin a LLC. So the fd is adjusted to 2, 4, 6, 8, 20 respectively.\nExclude the result with large run-to-run variance, 10% ~ 50%\nimprovement is observed when the system is underloaded:\n\n[hackbench pipe]\n\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\ncase                    load            baseline(std%)  compare%( std%)\nthreads-pipe-2          1-groups         1.00 (  2.89)  +47.33 (  1.20)\nthreads-pipe-2          2-groups         1.00 (  3.88)  +39.82 (  0.61)\nthreads-pipe-2          4-groups         1.00 (  8.76)   +5.57 ( 13.10)\nthreads-pipe-20         1-groups         1.00 (  4.61)  +11.72 (  1.06)\nthreads-pipe-20         2-groups         1.00 (  6.18)  +14.55 (  1.47)\nthreads-pipe-20         4-groups         1.00 (  2.99)  +10.16 (  4.49)\nthreads-pipe-4          1-groups         1.00 (  4.23)  +43.70 (  2.14)\nthreads-pipe-4          2-groups         1.00 (  3.68)   +8.45 (  4.04)\nthreads-pipe-4          4-groups         1.00 ( 17.72)   +2.42 (  1.14)\nthreads-pipe-6          1-groups         1.00 (  3.10)   +7.74 (  3.83)\nthreads-pipe-6          2-groups         1.00 (  3.42)  +14.26 (  4.53)\nthreads-pipe-6          4-groups         1.00 ( 10.34)  +10.94 (  7.12)\nthreads-pipe-8          1-groups         1.00 (  4.21)   +9.06 (  4.43)\nthreads-pipe-8          2-groups         1.00 (  1.88)   +3.74 (  0.58)\nthreads-pipe-8          4-groups         1.00 (  2.78)  +23.96 (  1.18)\n\n[chacha20 on simulated risc-v]\nHost time spent: 54762ms\nHost time spent: 28295ms\n\nTime reduced by 48%\n\nSuggested-by: Libo Chen \u003clibchen@purestorage.com\u003e\nSuggested-by: Adam Li \u003cadamli@os.amperecomputing.com\u003e\nSigned-off-by: Chen Yu \u003cyu.c.chen@intel.com\u003e\nCo-developed-by: Tim Chen \u003ctim.c.chen@linux.intel.com\u003e\nSigned-off-by: Tim Chen \u003ctim.c.chen@linux.intel.com\u003e\nSigned-off-by: Peter Zijlstra (Intel) \u003cpeterz@infradead.org\u003e\nLink: https://patch.msgid.link/71972e12ab4f08aff422b31e34df09bdbd94de84.1775065312.git.tim.c.chen@linux.intel.com\n",
  "tree_diff": [
    {
      "type": "modify",
      "old_id": "a56619b3761f45b2bbdafdb84cd5ebd5e509106a",
      "old_mode": 33188,
      "old_path": "kernel/sched/sched.h",
      "new_id": "71f6077da4662ecd5cf513ac1f1c3546fc4d50f1",
      "new_mode": 33188,
      "new_path": "kernel/sched/sched.h"
    },
    {
      "type": "modify",
      "old_id": "8954bf7900ffa48e5db30cf4aaeb75767304d134",
      "old_mode": 33188,
      "old_path": "kernel/sched/topology.c",
      "new_id": "6a36f8f6b7b103cf457a725868e5c50a5d344013",
      "new_mode": 33188,
      "new_path": "kernel/sched/topology.c"
    }
  ]
}
