userfaultfd: change the read API to return a uffd_msg

I had requests to return the full address (not the page aligned one)
to userland.

It's not entirely clear how the page offset could be relevant because
userfaults aren't like SIGBUS that can sigjump to a different place
and it actually skip resolving the fault depending on a page
offset. There's currently no real way to skip the fault especially
because after a UFFDIO_COPY|ZEROPAGE, the fault is optimized to be
retried within the kernel without having to return to userland first
(not even self modifying code replacing the .text that touched the
faulting address would prevent the fault to be repeated). Userland
cannot skip repeating the fault even more so if the fault was
triggered by a KVM secondary page fault or any get_user_pages or any
copy-user inside some syscall which will return to kernel code. The
second time FAULT_FLAG_RETRY_NOWAIT won't be set leading to a SIGBUS
being raised because the userfault can't wait if it cannot release the
mmap_map first (and FAULT_FLAG_RETRY_NOWAIT is required for that).

Still returning userland a proper structure during the read() on the
uffd, can allow to use the current UFFD_API for the future
non-cooperative extensions too and it looks cleaner as well. Once we
get additional fields there's no point to return the fault address
page aligned anymore to reuse the bits below PAGE_SHIFT.

The only downside is that the read() syscall will read 32bytes instead
of 8bytes but that's not going to be measurable overhead.

The total number of new events that can be extended or of new future
bits for already shipped events, is limited to 64 by the features
field of the uffdio_api structure. If more will be needed a bump of
UFFD_API will be required.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
3 files changed