userfaultfd: wp: UFFDIO_REGISTER_MODE_WP documentation update
Adds documentation about the write protection support.
Signed-off-by: Andrea Arcangeli <firstname.lastname@example.org>
diff --git a/Documentation/vm/userfaultfd.txt b/Documentation/vm/userfaultfd.txt
index bb2f945..ca62f18 100644
@@ -101,6 +101,55 @@
half copied page since it'll keep userfaulting until the copy has
+- if you requested UFFDIO_REGISTER_MODE_MISSING when registering then
+ you must provide some kind of page in your thread after reading from
+ the uffd. You must provide either UFFDIO_COPY or UFFDIO_ZEROPAGE.
+ The normal behavior of the OS automatically providing a zero page on
+ an annonymous mmaping is not in place.
+- none of the page-delivering ioctls default to the range that you
+ registered with. You must fill in all fields for the appropriate
+ ioctl struct including the range.
+- you get the address of the access that triggered the missing page
+ event out of a struct uffd_msg that you read in the thread from the
+ uffd. You can supply as many pages as you want with UFFDIO_COPY or
+ UFFDIO_ZEROPAGE. Keep in mind that unless you used DONTWAKE then
+ the first of any of those IOCTLs wakes up the faulting thread.
+- be sure to test for all errors including
+ (pollfd.revents & POLLERR). This can happen, e.g. when ranges
+ supplied were incorrect.
+== Workflow to get notification of written pages ==
+This is equivalent to (but faster than) using mprotect and a SIGSEGV
+Register a range with UFFDIO_REGISTER_MODE_WP. Instead of using
+mprotect(2) you use
+ioctl(uffd, UFFDIO_WRITEPROTECT, struct *uffdio_writeprotect) while
+mode = UFFDIO_WRITEPROTECT_MODE_WP in the struct passed in.
+The range does not default to and does not have to be identical to the
+range you registered with. You can write protect as many ranges as
+you like (inside the registered range). Then, in the thread reading
+from uffd the struct will have
+msg.arg.pagefault.flags & UFFD_PAGEFAULT_FLAG_WP set. Now you send
+ioctl(uffd, UFFDIO_WRITEPROTECT, struct *uffdio_writeprotect) again
+while pagefault.mode does not have UFFDIO_WRITEPROTECT_MODE_WP set.
+This wakes up the thread which will continue to run with writes now
+allowsed You can do the bookkeeping about the write in the uffd
+reading thread before the ioctl.
+If you registered with both
+UFFDIO_REGISTER_MODE_MISSING | UFFDIO_REGISTER_MODE_WP then you
+need to think about the sequence in which you supply a page and undo
+write protect. Note that there is a difference between writes into a
+WP area and into a !WP area. The former will have
+UFFD_PAGEFAULT_FLAG_WP set, the latter UFFD_PAGEFAULT_FLAG_WRITE.
+The latter did not fail on protection but you still need to supply a
+page when UFFDIO_REGISTER_MODE_MISSING was used.
== QEMU/KVM ==
QEMU/KVM is using the userfaultfd syscall to implement postcopy live