| Eighth iteration of the Core-Scheduling feature. |
| |
| Core scheduling is a feature that allows only trusted tasks to run |
| concurrently on cpus sharing compute resources (eg: hyperthreads on a |
| core). The goal is to mitigate the core-level side-channel attacks |
| without requiring to disable SMT (which has a significant impact on |
| performance in some situations). Core scheduling (as of v7) mitigates |
| user-space to user-space attacks and user to kernel attack when one of |
| the siblings enters the kernel via interrupts or system call. |
| |
| By default, the feature doesn't change any of the current scheduler |
| behavior. The user decides which tasks can run simultaneously on the |
| same core (for now by having them in the same tagged cgroup). When a tag |
| is enabled in a cgroup and a task from that cgroup is running on a |
| hardware thread, the scheduler ensures that only idle or trusted tasks |
| run on the other sibling(s). Besides security concerns, this feature can |
| also be beneficial for RT and performance applications where we want to |
| control how tasks make use of SMT dynamically. |
| |
| This iteration focuses on the the following stuff: |
| - Redesigned API. |
| - Rework of Kernel Protection feature based on Thomas's entry work. |
| - Rework of hotplug fixes. |
| - Address review comments in v7 |
| |
| Joel: Both a CGroup and Per-task interface via prctl(2) are provided for |
| configuring core sharing. More details are provided in documentation patch. |
| Kselftests are provided to verify the correctness/rules of the interface. |
| |
| Julien: TPCC tests showed improvements with core-scheduling. With kernel |
| protection enabled, it does not show any regression. Possibly ASI will improve |
| the performance for those who choose kernel protection (can be toggled through |
| sched_core_protect_kernel sysctl). Results: |
| v8 average stdev diff |
| baseline (SMT on) 1197.272 44.78312824 |
| core sched ( kernel protect) 412.9895 45.42734343 -65.51% |
| core sched (no kernel protect) 686.6515 71.77756931 -42.65% |
| nosmt 408.667 39.39042872 -65.87% |
| |
| v8 is rebased on tip/master. |
| |
| Future work |
| =========== |
| - Load balancing/Migration fixes for core scheduling. |
| With v6, Load balancing is partially coresched aware, but has some |
| issues w.r.t process/taskgroup weights: |
| https://lwn.net/ml/linux-kernel/20200225034438.GA617271@z... |
| - Core scheduling test framework: kselftests, torture tests etc |
| |
| Changes in v8 |
| ============= |
| - New interface/API implementation |
| - Joel |
| - Revised kernel protection patch |
| - Joel |
| - Revised Hotplug fixes |
| - Joel |
| - Minor bug fixes and address review comments |
| - Vineeth |
| |
| Changes in v7 |
| ============= |
| - Kernel protection from untrusted usermode tasks |
| - Joel, Vineeth |
| - Fix for hotplug crashes and hangs |
| - Joel, Vineeth |
| |
| Changes in v6 |
| ============= |
| - Documentation |
| - Joel |
| - Pause siblings on entering nmi/irq/softirq |
| - Joel, Vineeth |
| - Fix for RCU crash |
| - Joel |
| - Fix for a crash in pick_next_task |
| - Yu Chen, Vineeth |
| - Minor re-write of core-wide vruntime comparison |
| - Aaron Lu |
| - Cleanup: Address Review comments |
| - Cleanup: Remove hotplug support (for now) |
| - Build fixes: 32 bit, SMT=n, AUTOGROUP=n etc |
| - Joel, Vineeth |
| |
| Changes in v5 |
| ============= |
| - Fixes for cgroup/process tagging during corner cases like cgroup |
| destroy, task moving across cgroups etc |
| - Tim Chen |
| - Coresched aware task migrations |
| - Aubrey Li |
| - Other minor stability fixes. |
| |
| Changes in v4 |
| ============= |
| - Implement a core wide min_vruntime for vruntime comparison of tasks |
| across cpus in a core. |
| - Aaron Lu |
| - Fixes a typo bug in setting the forced_idle cpu. |
| - Aaron Lu |
| |
| Changes in v3 |
| ============= |
| - Fixes the issue of sibling picking up an incompatible task |
| - Aaron Lu |
| - Vineeth Pillai |
| - Julien Desfossez |
| - Fixes the issue of starving threads due to forced idle |
| - Peter Zijlstra |
| - Fixes the refcounting issue when deleting a cgroup with tag |
| - Julien Desfossez |
| - Fixes a crash during cpu offline/online with coresched enabled |
| - Vineeth Pillai |
| - Fixes a comparison logic issue in sched_core_find |
| - Aaron Lu |
| |
| Changes in v2 |
| ============= |
| - Fixes for couple of NULL pointer dereference crashes |
| - Subhra Mazumdar |
| - Tim Chen |
| - Improves priority comparison logic for process in different cpus |
| - Peter Zijlstra |
| - Aaron Lu |
| - Fixes a hard lockup in rq locking |
| - Vineeth Pillai |
| - Julien Desfossez |
| - Fixes a performance issue seen on IO heavy workloads |
| - Vineeth Pillai |
| - Julien Desfossez |
| - Fix for 32bit build |
| - Aubrey Li |
| |
| option-prefix PATCH v8 -tip |
| option-subject Core scheduling |
| option-skip-get-maint |