stalld v1.12

Most of the changes are cleanups. The most visible change is that
the stalld's systemd service now starts the daemon with FIFO:10
priority. This change is necessary to avoid having stalld starving or
being indirectly blocked on a real-time mutex owned by a starving
thread. The latter behavior indeed happened in the multi-threaded mode.

Changes:
  stalld.8: fix diff cruft left in manpage source
  stalld.c: clean up handling of nr_running
  stalld.c: remove duplicate parameter to fill_waiting_task()
  stalld: Add error handling in get_cpu_idle_time()
  stalld.service: Run stalld as sched_fifo via systemd
  packaging: clean up Makefiles and rpm specfile
  stalld: Always print current function for info messages
  stalld: Always print current function for warn messages
  stalld: Always print current function for die messages
  utils: change PATHMAX to 4096

Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
1 file changed
tree: 7436946268be3f21137b59eb62ef48766b2ad0a3
  1. doc/
  2. man/
  3. redhat/
  4. scripts/
  5. src/
  6. tests/
  7. .gitignore
  8. gpl-2.0.txt
  9. Makefile
  10. README.md
README.md

stalld

The stalld program (which stands for ‘stall daemon’) is a mechanism to prevent the starvation of operating system threads in a Linux system. The premise is to start up on a housekeeping cpu (one that is not used for real-application purposes) and to periodically monitor the state of each thread in the system, looking for a thread that has been on a run queue (i.e. ready to run) for a specifed length of time without being run. This condition is usually hit when the thread is on the same cpu as a high-priority cpu-intensive task and therefore is being given no opportunity to run.

When a thread is judged to be starving, stalld changes that thread to use the SCHED_DEADLINE policy and gives the thread a small slice of time for that cpu (specified on the command line). The thread then runs and when that timeslice is used, the thread is then returned to its original scheduling policy and stalld then continues to monitor thread states.

There is now an experimental option to boost using SCHED_FIFO. This logic is used if the running kernel does not support the SCHED_DEADLINE policy and may be forced by using the -F/--force_fifo option.

Command Line Options

Usage: stalld [-l] [-v] [-k] [-s] [-f] [-h] [-F] [-c cpu-list] [-p time in ns] [-r time in ns] [-d time in seconds] [-t time in seconds]

Logging options

  • -l/--log_only: only log information (do not boost) [false]
  • -v/--verbose: print info to the std output [false]
  • -k/--log_kmsg: print log to the kernel buffer [false]
  • -s/--log_syslog: print log to syslog [true]

Startup options

  • -c/--cpu: list of cpus to monitor for stalled threads [all cpus]
  • -f/--foreground: run in foreground [false but true when -v]
  • -P/--pidfile: write daemon pid to specified file [no pidfile]

Boosting options

  • -p/--boost_period: SCHED_DEADLINE period [ns] that the starving task will receive [1000000000]
  • -r/--boost_runtime: SCHED_DEADLINE runtime [ns] that the starving task will receive [20000]
  • -d/--boost_duration: how long [s] the starving task will run with SCHED_DEADLINE [3]
  • -F/--force_fifo: force using SCHED_FIFO for boosting

Monitoring options

  • -t/--starving_threshold: how long [s] the starving task will wait before being boosted [60]
  • -A/--aggressive_mode: dispatch one thread per run queue, even when there is no starving threads on all CPU (uses more CPU/power). [false]

Miscellaneous

  • -h/--help: print this menu

Repositories

The repository at https://gitlab.com/rt-linux-tools/stalld is the main repository, where the development takes place.

The repository at https://git.kernel.org/pub/scm/utils/stalld/stalld.git is the distribution repository, where distros can pick the latest released version.