x86, ept: Track dirty page w/o page fault

EPT Dirty Bit Use Scenario: Live Migration
------------------------------------------

Case 1: Enable VM dirty page track
 w/ PML
 ------
 a. Write protect guest memory in VM memory region creation
 b. Update VM dirty_bitmap in page fault path for huge page
 c. Qemu query VM dirty_bitmap by KVM_GET_DIRTY_LOG ioctl
    c1. Kick VM, sync dirty_bitmap with PML buffer in VM_EXIT
    c2. D bit will be cleared for 4K page
    c3. For huge page, write protect spte

 w/o PML
 -------
 a. Write protect guest memory in VM memory region creation
 b. VM memslot dirty_bitmap will be updated in page fault path
 c. Qemu query VM dirty_bitmap by KVM_GET_DIRTY_LOG ioctl
    c3. Write protect spte

Case 2:  Disable VM dirty page track
 w/ PML
 ------
 a. Set spte Dirty bit, so PML full VM_EXIT is not triggered

 w/o PML
 -------
 a. Nothing need to do.

kvm ioctl w/ KVM_SET_USER_MEMORY_REGION
kvm_vm_ioctl_set_memory_region
 -> __kvm_set_memory_region
    -> kvm_arch_commit_memory_region
       -> kvm_mmu_slot_apply_flags
  case 1:-> vmx_slot_enable_log_dirty
             -> kvm_mmu_slot_leaf_clear_dirty
                 -> __rmap_clear_dirty
                     -> spte_clear_dirty
                         -> kvm_mmu_slot_largepage_remove_write_access
                            -> __rmap_write_protect
                                -> spte_write_protect

  case 2:-> vmx_slot_disable_log_dirty
             -> kvm_mmu_slot_set_dirty
                 -> __rmap_set_dirty
                     -> spte_set_dirty

kvm ioctl w/ KVM_GET_DIRTY_LOG
 -> kvm_vm_ioctl_get_dirty_log
    -> vmx_flush_log_dirty
        -> kvm_flush_pml_buffers
     -> kvm_get_dirty_log_protect
         -> kvm_arch_mmu_enable_log_dirty_pt_masked
             -> vmx_enable_log_dirty_pt_masked
                 -> kvm_mmu_clear_dirty_pt_masked
                     -> __rmap_clear_dirty
                        -> spte_clear_dirty

Conclusion:
The only user of EPT Dirty bit is Qemu Live migration,
there is no other user inside kvm-kernel itself.
This inspire us to use EPT Dirty to track page hotness.
The admin should be able to coordinate b/w VM live migration
use case and pmem2dram case.

Reuse EPT Dirty Bit for pmem2dram
---------------------------------

Goal:
Provide interface for user space daemon to migrate written page.

Motivation:
a. AEP write latency is limited, read latency can satisfy
   workload. Track written page is necessary.

b. EPT A bit and D bit
  b1. By SDM, EPT A bit is a super set of EPT D bit, A bit does
      not necessarily imply written page.

  b2. In case of cache write, write are performed after load
      (read) the target address into cache, then perform write.
      Statistics of A bit and D bit could possibly be the same.

      Any cases of write is directed down to iMC?

  b3. In case of NT write, cache is by passed. The above cache
      case does not stand.

Need conclusive result whether A bit is enough to track written page.
Identify potential corner case where D bit is useful.

This patch build EPT D bit tracking up on ept_idle module, user
space daemon could easily leverage exiting code to benchmark.
In practice user space daemon is expected to account D bit as
accessed as well.

Signed-off-by: Fan Du <fan.du@intel.com>

Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
2 files changed