The updates included in this pull request for ftrace are:

 o Several clean ups to the code

   One such clean up was to convert to 64 bit time keeping, in the
   ring buffer benchmark code.

 o Adding of __print_array() helper macro for TRACE_EVENT()

 o Updating the sample/trace_events/ to add samples of different ways to
   make trace events. Lots of features have been added since the sample
   code was made, and these features are mostly unknown. Developers
   have been making their own hacks to do things that are already available.

 o Performance improvements. Most notably, I found a performance bug where
   a waiter that is waiting for a full page from the ring buffer will
   see that a full page is not available, and go to sleep. The sched
   event caused by it going to sleep would cause it to wake up again.
   It would see that there was still not a full page, and go back to sleep
   again, and that would wake it up again, until finally it would see a
   full page. This change has been marked for stable.

   Other improvements include removing global locks from fast paths.
ring-buffer: Do not wake up a splice waiter when page is not full

When an application connects to the ring buffer via splice, it can only
read full pages. Splice does not work with partial pages. If there is
not enough data to fill a page, the splice command will either block
or return -EAGAIN (if set to nonblock).

Code was added where if the page is not full, to just sleep again.
The problem is, it will get woken up again on the next event. That
is, when something is written into the ring buffer, if there is a waiter
it will wake it up. The waiter would then check the buffer, see that
it still does not have enough data to fill a page and go back to sleep.
To make matters worse, when the waiter goes back to sleep, it could
cause another event, which would wake it back up again to see it
doesn't have enough data and sleep again. This produces a tremendous
overhead and fills the ring buffer with noise.

For example, recording sched_switch on an idle system for 10 seconds
produces 25,350,475 events!!!

Create another wait queue for those waiters wanting full pages.
When an event is written, it only wakes up waiters if there's a full
page of data. It does not wake up the waiter if the page is not yet
full.

After this change, recording sched_switch on an idle system for 10
seconds produces only 800 events. Getting rid of 25,349,675 useless
events (99.9969% of events!!), is something to take seriously.

Cc: stable@vger.kernel.org # 3.16+
Cc: Rabin Vincent <rabin@rab.in>
Fixes: e30f53aad220 "tracing: Do not busy wait in buffer splice"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
1 file changed