| From: Yabin Cui <yabinc@google.com> |
| Date: Fri, 17 May 2019 13:52:31 +0200 |
| Subject: perf/ring_buffer: Fix exposing a temporarily decreased data_head |
| |
| commit 1b038c6e05ff70a1e66e3e571c2e6106bdb75f53 upstream. |
| |
| In perf_output_put_handle(), an IRQ/NMI can happen in below location and |
| write records to the same ring buffer: |
| |
| ... |
| local_dec_and_test(&rb->nest) |
| ... <-- an IRQ/NMI can happen here |
| rb->user_page->data_head = head; |
| ... |
| |
| In this case, a value A is written to data_head in the IRQ, then a value |
| B is written to data_head after the IRQ. And A > B. As a result, |
| data_head is temporarily decreased from A to B. And a reader may see |
| data_head < data_tail if it read the buffer frequently enough, which |
| creates unexpected behaviors. |
| |
| This can be fixed by moving dec(&rb->nest) to after updating data_head, |
| which prevents the IRQ/NMI above from updating data_head. |
| |
| [ Split up by peterz. ] |
| |
| Signed-off-by: Yabin Cui <yabinc@google.com> |
| Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> |
| Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> |
| Cc: Arnaldo Carvalho de Melo <acme@kernel.org> |
| Cc: Arnaldo Carvalho de Melo <acme@redhat.com> |
| Cc: Jiri Olsa <jolsa@redhat.com> |
| Cc: Linus Torvalds <torvalds@linux-foundation.org> |
| Cc: Namhyung Kim <namhyung@kernel.org> |
| Cc: Peter Zijlstra <peterz@infradead.org> |
| Cc: Stephane Eranian <eranian@google.com> |
| Cc: Thomas Gleixner <tglx@linutronix.de> |
| Cc: Vince Weaver <vincent.weaver@maine.edu> |
| Cc: mark.rutland@arm.com |
| Fixes: ef60777c9abd ("perf: Optimize the perf_output() path by removing IRQ-disables") |
| Link: http://lkml.kernel.org/r/20190517115418.224478157@infradead.org |
| Signed-off-by: Ingo Molnar <mingo@kernel.org> |
| Signed-off-by: Ben Hutchings <ben@decadent.org.uk> |
| --- |
| kernel/events/ring_buffer.c | 24 ++++++++++++++++++++---- |
| 1 file changed, 20 insertions(+), 4 deletions(-) |
| |
| --- a/kernel/events/ring_buffer.c |
| +++ b/kernel/events/ring_buffer.c |
| @@ -50,11 +50,18 @@ again: |
| head = local_read(&rb->head); |
| |
| /* |
| - * IRQ/NMI can happen here, which means we can miss a head update. |
| + * IRQ/NMI can happen here and advance @rb->head, causing our |
| + * load above to be stale. |
| */ |
| |
| - if (!local_dec_and_test(&rb->nest)) |
| + /* |
| + * If this isn't the outermost nesting, we don't have to update |
| + * @rb->user_page->data_head. |
| + */ |
| + if (local_read(&rb->nest) > 1) { |
| + local_dec(&rb->nest); |
| goto out; |
| + } |
| |
| /* |
| * Since the mmap() consumer (userspace) can run on a different CPU: |
| @@ -86,9 +93,18 @@ again: |
| rb->user_page->data_head = head; |
| |
| /* |
| - * Now check if we missed an update -- rely on previous implied |
| - * compiler barriers to force a re-read. |
| + * We must publish the head before decrementing the nest count, |
| + * otherwise an IRQ/NMI can publish a more recent head value and our |
| + * write will (temporarily) publish a stale value. |
| + */ |
| + barrier(); |
| + local_set(&rb->nest, 0); |
| + |
| + /* |
| + * Ensure we decrement @rb->nest before we validate the @rb->head. |
| + * Otherwise we cannot be sure we caught the 'last' nested update. |
| */ |
| + barrier(); |
| if (unlikely(head != local_read(&rb->head))) { |
| local_inc(&rb->nest); |
| goto again; |