blob: d40e500f11dc215da284dbccee23978a08935782 [file] [log] [blame]
CPU Controller
The CPU controller is responsible for grouping tasks together that will be
viewed by the scheduler as a single unit. The CFS scheduler will first divide
CPU time equally between all entities in the same level, and then proceed by
doing the same in the next level. Basic use cases for that are described in the
main cgroup documentation file, cgroups.txt.
Users of this functionality should be aware that deep hierarchies will of
course impose scheduler overhead, since the scheduler will have to take extra
steps and look up additional data structures to make its final decision.
Through the CPU controller, the scheduler is also able to cap the CPU
utilization of a particular group. This is particularly useful in environments
in which CPU is paid for by the hour, and one values predictability over
CPU Accounting
The CPU cgroup will also provide additional files under the prefix "cpuacct".
Those files provide accounting statistics and were previously provided by the
separate cpuacct controller. Although the cpuacct controller will still be kept
around for compatibility reasons, its usage is discouraged. If both the CPU and
cpuacct controllers are present in the system, distributors are encouraged to
always mount them together.
The CPU controller exposes the following files to the user:
- cpu.shares: The weight of each group living in the same hierarchy, that
translates into the amount of CPU it is expected to get. Upon cgroup creation,
each group gets assigned a default of 1024. The percentage of CPU assigned to
the cgroup is the value of shares divided by the sum of all shares in all
cgroups in the same level.
- cpu.cfs_period_us: The duration in microseconds of each scheduler period, for
bandwidth decisions. This defaults to 100000us or 100ms. Larger periods will
improve throughput at the expense of latency, since the scheduler will be able
to sustain a cpu-bound workload for longer. The opposite of true for smaller
periods. Note that this only affects non-RT tasks that are scheduled by the
CFS scheduler.
- cpu.cfs_quota_us: The maximum time in microseconds during each cfs_period_us
in for the current group will be allowed to run. For instance, if it is set to
half of cpu_period_us, the cgroup will only be able to peak run for 50 % of
the time. One should note that this represents aggregate time over all CPUs
in the system. Therefore, in order to allow full usage of two CPUs, for
instance, one should set this value to twice the value of cfs_period_us.
- cpu.stat: statistics about the bandwidth controls. No data will be presented
if cpu.cfs_quota_us is not set. The file presents three
nr_periods: how many full periods have been elapsed.
nr_throttled: number of times we exausted the full allowed bandwidth
throttled_time: total time the tasks were not run due to being overquota
- cpu.rt_runtime_us and cpu.rt_period_us: Those files are the RT-tasks
analogous to the CFS files cfs_quota_us and cfs_period_us. One important
difference, though, is that while the cfs quotas are upper bounds that
won't necessarily be met, the rt runtimes form a stricter guarantee.
Therefore, no overlap is allowed. Implications of that are that given a
hierarchy with multiple children, the sum of all rt_runtime_us may not exceed
the runtime of the parent. Also, a rt_runtime_us of 0, means that no rt tasks
can ever be run in this cgroup. For more information about rt tasks runtime
assignments, see scheduler/sched-rt-group.txt
- cpu.stat_percpu: Various scheduler statistics for the current group. The
information provided in this file is akin to the one displayed in /proc/stat,
except for the fact that it is cgroup-aware. The file format consists of a
one-line header that describes the fields being listed. No guarantee is
given that the fields will be kept the same between kernel releases, and
readers should always check the header in order to introspect it.
Each of the following lines will show the respective field value for
each of the possible cpus in the system. All values are show in
nanoseconds. One example output for this file is:
cpu user nice system irq softirq guest guest_nice wait nr_switches nr_running
cpu0 471000000 0 15000000 0 0 0 0 1996534 7205 1
cpu1 588000000 0 17000000 0 0 0 0 2848680 6510 1
cpu2 505000000 0 14000000 0 0 0 0 2350771 6183 1
cpu3 472000000 0 16000000 0 0 0 0 19766345 6277 2
- cpuacct.usage: The aggregate CPU time, in nanoseconds, consumed by all tasks
in this group.
- cpuacct.usage_percpu: The CPU time, in nanoseconds, consumed by all tasks in
this group, separated by CPU. The format is an space-separated array of time
values, one for each present CPU.
- cpuacct.stat: aggregate user and system time consumed by tasks in this group.
The format is
user: x
system: y