Sie sind auf Seite 1von 60

2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

+   

Sina & Shahriar's Blog


An aggressive out-of-order blog...

Hypervisor From Scratch – Part 7: Using EPT & Page-Level 4

Monitoring Features
Published January 20, 2020 by Sina Karvandi

Introduction

This is the 7th part of the tutorial Hypervisor From Scratch, and it’s about using the
Extended Page Table (EPT) in an already running system. As you might know,
paging is an essential part of managing memory on modern operating systems.
Hypervisors use an additional paging table; this gives us an excellent opportunity to
monitor di erent aspects of memory (Read-Write-Execute) without modifying the
operating systems page-tables. EPT is a hardware mechanism, so it’s fast, but on the
other hand, we have to deal with di erent caching and synchronization problems.

This part is highly dependent on the 4th part of the tutorial – Part 4: Address
Translation Using Extended Page Table (EPT), so please read this part one more
time; thus, I avoid redescribing about the basic concept relating to EPT Tables.

In the 7th part, we’ll see how we can virtualize our currently running system by
configuring VMCS and creating identity tables based on Memory Type Range
Register (MTRR) then we use monitoring features to detect the execution of some of
the Windows functions.

https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 1/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

This part is highly inspired by Simplevisor and Gbhv.

The picture of this post was taken by one of my best friends Ahmad, from Khānābād
Village, Aligudarz.

Before starting, I should give special thanks to my friend Petr Benes for his
contributions to Hypervisor From Scratch, of course, Hypervisor From Scratch could
never exist without his help. I also give my regards to Alex Ionescu as he always
answers my question patiently.

Overview

This part is divided into seven main sections :

1. Implementing mechanisms to manage Vmcalls


2. Starting with MMU Virtualization (EPT)
3. Explaining Memory Type Range Register (MTRR) concepts
4. Describing Page-Level Monitoring features using EPT
5. Invalidating Translations Derived from EPT (INVEPT)
6. Fixing some previous design caveat regarding deadlocks and
synchronization problems
7. Discussion (In this section we discuss the di erent question(s) and
approaches about EPT)

At last, I talk about some important notes you need to know in order to debug
hypervisor and EPT.

Guys, it’s ok if you didn’t understand some of the parts, by reading this article, you’ll
get an idea, you could use EPT and over the time you’ll understand things better.

The source code of this part changed drastically compared to the previous part;
naming conventions are improved, so you see a much cleaner and readable code;
also lots of new routines added to the code, for examples routines starting with Hv
are hypervisor routines, you have to call them from IRP Major functions and avoid
calling methods with Vmx prefix directly as these functions manage the operations
relating to VMX Operations, functions with Asm prefix are inline-assembly functions
and functions starting with Ept are those that relate to Extended Page Table (EPT).
Also, functions with Vmcall prefix are for VMCALL services, and functions with
Invept are related to Invalidate EPT caches.

The full source code of this tutorial is available on GitHub :

[https://github.com/SinaKarvandi/Hypervisor-From-Scratch]

https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 2/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

Table of Contents

Introduction
Overview
Table of Contents
Implementing Functions to Manage Vmcalls
Starting with MMU virtualization (EPT)
Memory Type Range Register (MTRR)
Building MTRR Map
Fixed-Range MTRRs and PAT
Virtualizing Current System’s Memory using EPT
EPT Identity Mapping
Setting up PML4 and PML3 entries
Setting up PML2 entries
EPT Violation
EPT Misconfiguration
Adding EPT to VMCS
Monitoring Page’s RWX Activity
Pre-allocating Bu ers for VMX Root Mode
Setting hook before Vmlaunch
Setting hook a er Vmlaunch
Finding a Page’s entry in EPT Tables
1. Finding PML4, PML3, PML2 entries
2. Finding PML1 entry
Splitting 2 MB Pages to 4 KB Pages
Applying the Hook
Handling hooked pages’ vm-exits
Invalidating Translations Derived from EPT (INVEPT)
Invalidating All Contexts
Invalidating Single Context
Broadcasting Invept to all logical cores simultaneously
Fixing Previous Design Issues
Support to more than 64 logical cores
Synchronization problem in exiting VMX
The issues relating to the Meltdown mitigation
Some tips for debugging hypervisors
Let’s Test it!
How to test?
Demo
Discussion
Conclusion
References

https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 3/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

Implementing Functions to Manage Vmcalls

We start this article by implementing functions relating to VMCALL. Intel describes


Vmcall by “Call to VM monitor by causing VM exit.”.

Vmcall allows guest so ware to call for service into an underlying VM monitor. The
details of the programming interface for such calls are VMM-specific. This
instruction does nothing more than cause a VM exit.

In other words, whenever you execute a Vmcall instruction in Vmx non-root mode
(whenever a vm-exit occurs, we are in vmx root-mode, and we stay in vmx root
mode until we execute VMRESUME or VMXOFF so any other contexts is vmx non-
root mode means that other drivers can use Vmcall in their contexts to request a
service from our hypervisor in vmx root mode).

Execution of VMCALL causes a Vm-exit (EXIT_REASON_VMCALL). As we can set


registers and stack before execution of VMCALL so we can send parameters to the
Vmcall handler, I mean all we need to do is designing a calling-convention so that
both vmcall handler and driver which requests a service can work together
perfectly.

The first thing we need to implement is a function in assembly, which executes


VMCALL and returns.

1 AsmVmxVmcall PROC
2     vmcall                  ; VmxVmcallHandler(UINT64 VmcallNumber, UINT64
3     ret                     ; Return type is NTSTATUS and it's on RAX from
4 AsmVmxVmcall ENDP

It defines like this,


https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 4/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

1 extern NTSTATUS inline AsmVmxVmcall(unsigned long long VmcallNumber, unsig

What distinguished from the above code is that we’re not modifying anything in
AsmVmxVmcall, means that if someone passes the parameters to the
AsmVmxVmcall, then the parameters are in RCX, RDX, R8, R9 and rest of them into
the stack, that’s because of x64 FAST CALL calling convention.

Keep in mind that if you’re designing hypervisor for Linux, fast call in Linux is
di erent from the fast-call in Windows.

As we saved all the registers on vm-exit so in vm-exit handler we pass the


GuestRegs->rcx, GuestRegs->rdx, GuestRegs->r8, GuestRegs->r9 to the
VmxVmcallHandler, the RCX is the Vmcall Number which specifies the service we
want our hypervisor to perform and RDX and R8 and R9 are optional parameters.

1 case EXIT_REASON_VMCALL:
2 {
3 GuestRegs->rax = VmxVmcallHandler(GuestRegs->rcx, GuestRegs->rdx, GuestRe
4 break;
5 }

For example, we have the following services (Vmcall Numbers) for our hypervisor in
this part.

1  
2 #define VMCALL_TEST 0x1 // Test VMCALL
3 #define VMCALL_VMXOFF 0x2 // Call VMXOFF to turn off the hypervisor
4 #define VMCALL_EXEC_HOOK_PAGE 0x3 // VMCALL to Hook ExecuteAccess bit of t
5 #define VMCALL_INVEPT_ALL_CONTEXT 0x4 // VMCALL to invalidate EPT (All Con
6 #define VMCALL_INVEPT_SINGLE_CONTEXT 0x5 // VMCALL to invalidate EPT (A

There is nothing special for VmxVmcallHandler, it’s just a simple switch case.

1 /* Main Vmcall Handler */


2 NTSTATUS VmxVmcallHandler(UINT64 VmcallNumber, UINT64 OptionalParam1, UIN
3 {
4 NTSTATUS VmcallStatus;
5 BOOLEAN HookResult;
6  
7 VmcallStatus = STATUS_UNSUCCESSFUL;
8 switch (VmcallNumber)
9 {
10 case VMCALL_TEST:
11 {
12 VmcallStatus = VmcallTest(OptionalParam1, OptionalParam2, OptionalParam3
13 break;
14 }
15 default:
16 {
17 LogWarning("Unsupported VMCALL");
18 VmcallStatus = STATUS_UNSUCCESSFUL;
19 break;
20 }
21 }
22 return VmcallStatus;
23 }

https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 5/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

For testing it, I created a function called VmcallTest, it simply shows the parameters
passed to Vmcall.

1 /* Test Vmcall (VMCALL_TEST) */


2 NTSTATUS VmcallTest(UINT64 Param1, UINT64 Param2, UINT64 Param3) {
3  
4 LogInfo("VmcallTest called with @Param1 = 0x%llx , @Param2 = 0x%llx , @Pa
5 return STATUS_SUCCESS;
6 }

Finally, we can use the following piece of code and pass VMCALL_TEST as the
Vmcall Number along with other optional parameters.

1 //  Check if everything is ok then return true otherwise false


2 AsmVmxVmcall(VMCALL_TEST, 0x22, 0x333, 0x4444);

Don’t forget that the above code should bee only executed in vmx non-root mode.

There is nothing more I can say about VMCALL, but for further reading (not related
to our hypervisor), if you want to know what happens if you execute VMCALL in vmx
root-mode, it invokes an SMM monitor. This invocation will activate the dual-
monitor treatment of system-management interrupts (SMIs) and system-
management mode (SMM) if it is not already active. In other words, executing
Vmcall in vmx root mode causes an SMM VM exit!

Read Section 34.15.2 and Section 34.15.6 in Intel SDM for more information.

Starting with MMU virtualization (EPT)

Let me start with di erences between physical and virtual address,

Physical addressing means that your program knows the real layout of RAM. When
you access a variable at address 0x8746b3, that’s where it stored in the physical RAM
chips.

With virtual addressing, all application memory accesses go to a page table, which
then maps from the virtual to the physical address. So every application has its own
“private” address space, and no program can read or write to another program’s
memory. 

EPT is a page table with a page-walk length of 4 (or in the newer versions 5). It
translates guest-physical addresses to host-physical addresses.

First, you have to understand that EPT maps guest physical pages to host physical
pages, mapping physical addresses make hypervisors much easier to understand
because you can forget about all the concepts relating to virtual memory and
operating system’s memory manager. Why? That’s because you cannot allocate
https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 6/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

more physical memory. Sure, you can hot-plug RAM right into the motherboard, but
let’s forget about that for now 😉 , so the RAM usually starts at 0 and usually ends at
AMOUNT OF RAM + SOME MORE, where SOME MORE is some MMIO/device space.

Look at the following picture (from hvpp), Memory Ranges from VMWare VM with 2
GB of RAM.

Note the holes between ranges (e.g., A0000 – 100000); the ranges in the screenshot
are backed by actual physical RAM, and the holes are the MMIO space.

By now, you know that if you allocate or free memory, the RAM ranges are always
present and what changes are the content of data in the RAM.

Keep in mind, there are certainly no holes in the RAM as an electronic circuit, but it’s
how BIOS maps certain physical memory ranges to the actual hardware RAM, in
other words, RAM usually isn’t one contiguous address space, if you have 1 GB of
RAM it’s o en not one single piece of 0 … 1GB physical address space, but some
parts of that space belongs to, e.g. network card, audio card, USB hub, etc.

Let’s see what hypervisors like VMWare, Hyper-V, VirtualBox do with physical
memory. We don’t have the same approach, but it helps you understand MMU
virtualization better.

In VMWare (Hyper-v, VirtualBox, etc), the VM has its own physical memory, and our
PC (host) also has some physical address space. EPT exists so that you can translate
the guest physical memory to host physical memory. For example, if a guest wants
to read from Physical Address 0x1000, it looks into EPT, and EPT tells it that the
content of the memory is on the host’s physical address 0x5000. You certainly do
not want to let some guests in VMWare read physical memory on the host, so it’s
VMWare’s job to setup EPTs correctly and have some chunk of physical memory
dedicated to a guest.

Memory Type Range Register (MTRR)

By now, you have some idea about how memory (RAM) is divided into regions; these
regions can be found using MTRR registers, that’s all!

https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 7/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

Now let’s explain them more precisely.

Wikipedia defines MTRRs like this :

Memory type range registers (MTRRs) are a set of processor supplementary


capabilities control registers that provide system so ware with control of how
accesses to memory ranges by the CPU are cached. It uses a set of programmable
model-specific registers (MSRs), which are special registers provided by most
modern CPUs. Possible access modes to memory ranges can be uncached, write-
through, write-combining, write-protect, and write-back. In write-back mode,
writes are written to the CPU’s cache, and the cache is marked dirty so that its
contents are written to memory later.

In old x86 architecture systems, mainly where separate chips provided the cache
outside of the CPU package, this function was controlled by the chipset itself and
configured through BIOS settings, when the CPU cache was moved inside the CPU,
the CPUs implemented fixed-range MTRRs.

Typically, the BIOS so ware configures the MTRRs. The operating system or
executive is then free to modify the memory map using the typical page-level
cacheability attributes.

If you confused by reading the above sentences, let me explain it more clearly. RAM
is divided into di erent regions, We want to read the details (Base Address, End
Address, and Cache Policy) of these chunks using MTRR Registers. Cache policy is
something that BIOS or Operating System sets for a particular region. For example,
the operating system decides to put UC (uncached) to a region that starts from
0x1000 to 0x2000 (Physical Address) of RAM then it chooses to put WB (Writeback) to
a region starting from 0x5000 to 0x7000 (Physical Address), it’s based on OS policy. If
you don’t know about the di erent memory type caches (e.g., UC, WB), you can
read here.

OK, let’s see how to read these MTRRs.

https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 8/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

The availability of the MTRR feature is model-specific means that we can determine
if MTRRs are supported on a processor by executing the CPUID instruction and
reading the state of the MTRR flag (bit 12) in the feature information register (EDX).
Still, This check is not essential as our process probably supports as it’s an old
feature.

What is essential for us, is an MSR called “IA32_MTRR_DEF_TYPE“. The following


structure represents the IA32_MTRR_DEF_TYPE :

1 // MSR_IA32_MTRR_DEF_TYPE
2 typedef union _IA32_MTRR_DEF_TYPE_REGISTER
3 {
4 struct
5 {
6 /**
7 * [Bits 2:0] Default Memory Type.
8 */
9 UINT64 DefaultMemoryType : 3;
10 UINT64 Reserved1 : 7;
11  
12 /**
13 * [Bit 10] Fixed Range MTRR Enable.
14 */
15 UINT64 FixedRangeMtrrEnable : 1;
16  
17 /**
18 * [Bit 11] MTRR Enable.
19 */
20 UINT64 MtrrEnable : 1;
21 UINT64 Reserved2 : 52;
22 };
23  
24 UINT64 Flags;
25 } IA32_MTRR_DEF_TYPE_REGISTER, * PIA32_MTRR_DEF_TYPE_REGISTER;
26  

We implement a function called “EptCheckFeatures,” this function checks to see


whether our processor supports basic EPT features or not; for MTRRs, we’ll check
whether MTRRs are enabled or not. Having an enabled MTRR is necessary for our
hypervisor. (we’ll complete this function later when we’re describing EPT.)

1 IA32_MTRR_DEF_TYPE_REGISTER MTRRDefType;
2  
3 MTRRDefType.Flags = __readmsr(MSR_IA32_MTRR_DEF_TYPE);
4  
5 if (!MTRRDefType.MtrrEnable)
6 {
7 LogError("Mtrr Dynamic Ranges not supported");
8 return FALSE;
9 }

Building MTRR Map

Before creating a map from memory regions, It’s good to see how Windbg shows the
MTRR regions and their caching policies using the “!mtrr” command.

https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 9/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

As you can see in the above picture, Windows prefers to use Fixed Range Registers
(Fixed-support enabled) and variable range registers.

I’ll talk about fixed range registers later in this article.

In order to read MTRRs, we start by reading the VCNT value of IA32_MTRRCAP MSR
(0xFE), which determines the number of variable MTRRs (Number of regions).

https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 10/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

The next step is to iterate through each MTRR variable; we read


MSR_IA32_MTRR_PHYSBASE0 and MSR_IA32_MTRR_PHYSMASK0 for each range
and check if the range is valid or not (based on
IA32_MTRR_PHYSMASK_REGISTER.Valid bit).

1 CurrentPhysBase.Flags = __readmsr(MSR_IA32_MTRR_PHYSBASE0 + (CurrentRegis


2 CurrentPhysMask.Flags = __readmsr(MSR_IA32_MTRR_PHYSMASK0 + (CurrentRegis
3  

Now we need to calculate the start address and the end address (physical) based on
MSRs.

The start address:

1 // Calculate the base address in bytes


2 Descriptor->PhysicalBaseAddress = CurrentPhysBase.PageFrameNumber * PAGE_

The end address:

1 // Calculate the total size of the range


2 // The lowest bit of the mask that is set to 1 specifies the size of the
3 _BitScanForward64(&NumberOfBitsInMask, CurrentPhysMask.PageFrameNumber *
4  
5 // Size of the range in bytes + Base Address
6 Descriptor->PhysicalEndAddress = Descriptor->PhysicalBaseAddress + ((1ULL
7  
8  

For further information about the calculation of MTRRs, you can read Intel SDM Vol
3A (11.11.3 Example Base and Mask Calculations).

And finally, read the cache policy which is set by whether BIOS or operating system.

1 // Memory Type (cacheability attributes)


2 Descriptor->MemoryType = (UCHAR)CurrentPhysBase.Type;

Putting it all together, we have the following function :

1  
2 /* Build MTRR Map of current physical addresses */
3 BOOLEAN EptBuildMtrrMap()
4 {
5 IA32_MTRR_CAPABILITIES_REGISTER MTRRCap;
6 IA32_MTRR_PHYSBASE_REGISTER CurrentPhysBase;
7 IA32_MTRR_PHYSMASK_REGISTER CurrentPhysMask;
8 PMTRR_RANGE_DESCRIPTOR Descriptor;
9 ULONG CurrentRegister;
10 ULONG NumberOfBitsInMask;
11  
12  
13 MTRRCap.Flags = __readmsr(MSR_IA32_MTRR_CAPABILITIES);
14  
15 for (CurrentRegister = 0; CurrentRegister < MTRRCap.VariableRangeCount;
16 {
17 // For each dynamic register pair
18 CurrentPhysBase.Flags = __readmsr(MSR_IA32_MTRR_PHYSBASE0 + (CurrentRegi

https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 11/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

19 CurrentPhysMask.Flags = __readmsr(MSR_IA32_MTRR_PHYSMASK0 + (CurrentRegi


20  
21 // Is the range enabled?
22 if (CurrentPhysMask.Valid)
23 {
24 // We only need to read these once because the ISA dictates that MTRRs a
25 // during BIOS initialization.
26 Descriptor = &EptState->MemoryRanges[EptState->NumberOfEnabledMemoryRang
27  
28 // Calculate the base address in bytes
29 Descriptor->PhysicalBaseAddress = CurrentPhysBase.PageFrameNumber * PAGE
30  
31 // Calculate the total size of the range
32 // The lowest bit of the mask that is set to 1 specifies the size of the
33 _BitScanForward64(&NumberOfBitsInMask, CurrentPhysMask.PageFrameNumber *
34  
35 // Size of the range in bytes + Base Address
36 Descriptor->PhysicalEndAddress = Descriptor->PhysicalBaseAddress + ((1UL
37  
38 // Memory Type (cacheability attributes)
39 Descriptor->MemoryType = (UCHAR)CurrentPhysBase.Type;
40  
41 if (Descriptor->MemoryType == MEMORY_TYPE_WRITE_BACK)
42 {
43 /* This is already our default, so no need to store this range.
44 * Simply 'free' the range we just wrote. */
45 EptState->NumberOfEnabledMemoryRanges--;
46 }
47 LogInfo("MTRR Range: Base=0x%llx End=0x%llx Type=0x%x", Descriptor->Phys
48 }
49 }
50  
51 LogInfo("Total MTRR Ranges Committed: %d", EptState->NumberOfEnabledMemo
52  
53 return TRUE;
54 }
55  

Fixed-Range MTRRs and PAT

The above section is enough for understanding the MTRRs for EPT. Still, I want to
talk a little more about physical and virtual memory layout and caching policy (you
can skip this section as it does not relate to our hypervisor).

There are other MTRR registers called Fixed Range Registers as its name implies,
these registers are some predefined ranges defined by the processor (you can see
them in the first lines of !mtrr command in Windbg).

These ranges are showed in the following table:

https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 12/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

As you can see, the start of physical RAM is defined by these fixed range registers,
which are for performance and legacy reasons.

Note that MTRRs should be defined contiguously; if your MTRRs are not contiguous,
then the rest of the RAM is typically assumed as a hole.

Keep in mind that caching policy for each region of RAM is defined by MTRRs for
PHYSICAL regions and PAGE ATTRIBUTE TABLE (PAT) for virtual areas so that each
page can use its own caching policy by configuring IA32_PAT MSR. This means that
sometimes the caching policy specified in MTRR registers is ignored, and instead, a
page-level cache policy is used. There is a table in Intel SDM that shows the
precedence rules between PAT and MTRRs (Table 11-7. E ective Page-Level Memory
Types for Pentium III and More Recent Processor Families).

For further reading, you can read Intel SDM (Chapter 11 volume 3 A – 11.11 MEMORY
TYPE RANGE REGISTERS (MTRRS) and 11.12 PAGE ATTRIBUTE TABLE (PAT)).

https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 13/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

Virtualizing Current System’s Memory using EPT

As you have some previous information from EPT (part 4), we create an EPT table for
our VM. In the case of fully virtualizing memory of the current machine, there are
di erent approaches in implementing EPT; we can either have a separate EPT table
for each of the cores or an EPT table for all the cores, our approach is using one EPT
for all the cores as it’s simpler to implement and manage (more details about the
benefits and caveat are discussed in Discussion section).

What we are trying to do is creating an EPT table that maps all of the available
physical memory (we have the details of physical memory from MTRRs) to the
physical address. It’s something like adding a table that maps the previous
addresses to the previous address with some additional fields to control them. It’s
ok if you’re confused, just read the rest of the article and things become more clear.

EPT Identity Mapping

In our hypervisor or all of the hypervisors that virtualize an already running system
(not VMWare, VirtualBox, etc), we have a term called “Identity Mapping or 1:1
mapping“. It means that if you access guest PA (Physical Address) 0x4000, it will
access host PA at 0x4000, thus, you have to map RAM’s hole as well as memory
ranges to the guest.

It is the same as regular page tables (you can set page tables that way as well so that
virtual address 0x1234 corresponds to the physical address 0x1234);

If you don’t map some physical memory and the guest access it, then you’ll get “EPT
Violation”, which can be understood as the hypervisor’s page fault.

https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 14/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

In order to map everything one by one, we’ll create PML4Es, then PDPTEs, then
PDEs, and finally, PEs. In cases with 2 MB of granularity, we’ll skip PEs. Of course, it’s
preferred to have 4 KB granularity but keep in mind that 4GB of RAM results in one
million of 4 KB pages thus having a 4 KB granularity will eat a lot of memory, besides
this, setting 4 KB granularity will take quite some time which will drive you crazy if
you test your hypervisor frequently.

What hvpp, gbhv, and most of the other hypervisors do is initially set up 2 MB for
the whole system (including RAM Ranges and MMIO holes) and then break some 2
MB pages into 4 KB pages as needed.

A er splitting to 4 KB pages, you can merge them back to 2 MB pages again. We do


the same for our hypervisor driver, first initial with 2 MB of granularity, then split
them to 4 KB whenever needed.

Why we shouldn’t care about new memory allocations of Windows?

Well, that’s because we mapped all of the physical memory (every possible
addresses in physical RAM) using 2 MB chunks, including those which are allocated
and those which are not allocated yet, so no matter if Windows allocates a new
memory chunk, we already have it in our EPT table.

What we want to do is creating a PML4E; then PDPTE, we’ll add that PDPTE into
PML4E, then create PDE and add it to the PDPTE and finally create PE, which will
point to physical address 0. Then we create another PE, that will point to address
0x1000 (if the granularity is 4 KB) or 0x200000 ( if the granularity is 2 MB ) and add it
again 512 times (maximum entries in all paging tables including EPT Page tables
and regular page tables are 512) then we’ll create another PDE and repeat!

All in all, our hypervisor should not care about any virtual address, it’s all about
physical memory.

That’s enough for theory, let’s implement it!

Setting up PML4 and PML3 entries

First of all, we have to allocate a large memory for our EPT page table and then zero
it.

1 PageTable = MmAllocateContiguousMemory((sizeof(VMM_EPT_PAGE_TABLE) / PAG


2  
3 if (PageTable == NULL)
4 {
5 LogError("Failed to allocate memory for PageTable");
6 return NULL;
7 }

https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 15/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

8  
9 // Zero out all entries to ensure all unused entries are marked Not Pres
10 RtlZeroMemory(PageTable, sizeof(VMM_EPT_PAGE_TABLE));

We have a linked list that holds the trace of every allocated memory; we have to
initialize it first so we can de-allocate our allocated pages whenever we want to turn
o our hypervisor.

1 // Initialize the dynamic split list which holds all dynamic page splits
2 InitializeListHead(&PageTable->DynamicSplitList);

It’s time to initialize the first table (EPT PML4). For the initialization phase, we set all
the accesses to 1 (including Read Access, Write Access, Execute Access) on all of the
EPT tables.

The physical address (Page Frame Number – PFN) for the PML4E is PML3’s address,
and as it’s aligned and whenever the processor wants to translate it (it performs
multiplication by PAGE_SIZE) so we divide it by PAGE_SIZE (4096).

1 // Mark the first 512GB PML4 entry as present, which allows us to manage
2 PageTable->PML4[0].PageFrameNumber = (SIZE_T)VirtualAddressToPhysicalAddr
3 PageTable->PML4[0].ReadAccess = 1;
4 PageTable->PML4[0].WriteAccess = 1;
5 PageTable->PML4[0].ExecuteAccess = 1;

Each PML4 entry covers 512 GB of memory, so one entry is more than enough. Each
table has 512 entries, so we have to fill PML3 with 512 of 1 GB entries. We’re done
this by creating a template with RWX enabled and use __stosq to fill the table with
this template continuously. __stosq generates a store string instruction (rep stosq)
means that continuously (in our case VMM_EPT_PML3E_COUNT=512) copy
something on a special location.

The next step is to convert our previously allocated PML2 entries to physical
addresses and fill the PML3 with those addresses.

1 // Set up one 'template' RWX PML3 entry and copy it into each of the 512
2 // Using the same method as SimpleVisor for copying each entry using int
3 RWXTemplate.ReadAccess = 1;
4 RWXTemplate.WriteAccess = 1;
5 RWXTemplate.ExecuteAccess = 1;
6  
7 // Copy the template into each of the 512 PML3 entry slots
8 __stosq((SIZE_T*)&PageTable->PML3[0], RWXTemplate.Flags, VMM_EPT_PML3E_C
9  
10 // For each of the 512 PML3 entries
11 for (EntryIndex = 0; EntryIndex < VMM_EPT_PML3E_COUNT; EntryIndex++)
12 {
13 // Map the 1GB PML3 entry to 512 PML2 (2MB) entries to describe each lar
14 // NOTE: We do *not* manage any PML1 (4096 byte) entries and do not allo
15 PageTable->PML3[EntryIndex].PageFrameNumber = (SIZE_T)VirtualAddressToPh
16 }
17  

https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 16/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

For PML2, we have the same approach, fill it with an RWX template, but this time we
set LargePage to 1 (for the reason I told you above about initialization with 2 MB
granularity). Exactly same as above, we use __stosq to fill these entries, this time
with 512*512 entries as we have 512 entries, each of which describes 512 entries.

The next step is to set up each entry’s PFN addresses. I’ll describe
EptSetupPML2Entry in the next section.

Note that we’re are filling entries for a 512*512 table, so we have to perform a
multiplication by 512 for each EntryGroupIndex and then add it to the current
PML2’s address (EntryIndex).

1 // All PML2 entries will be RWX and 'present'


2 PML2EntryTemplate.WriteAccess = 1;
3 PML2EntryTemplate.ReadAccess = 1;
4 PML2EntryTemplate.ExecuteAccess = 1;
5  
6 // We are using 2MB large pages, so we must mark this 1 here.
7 PML2EntryTemplate.LargePage = 1;
8  
9 /* For each collection of 512 PML2 entries (512 collections * 512 entrie
10    This marks the entries as "Present" regardless of if the actual syste
11    EPT handler if the guest access a page outside a usable range, despit
12 */
13 __stosq((SIZE_T*)&PageTable->PML2[0], PML2EntryTemplate.Flags, VMM_EPT_P
14  
15 // For each of the 512 collections of 512 2MB PML2 entries
16 for (EntryGroupIndex = 0; EntryGroupIndex < VMM_EPT_PML3E_COUNT; EntryGr
17 {
18 // For each 2MB PML2 entry in the collection
19 for (EntryIndex = 0; EntryIndex < VMM_EPT_PML2E_COUNT; EntryIndex++)
20 {
21 // Setup the memory type and frame number of the PML2 entry.
22 EptSetupPML2Entry(&PageTable->PML2[EntryGroupIndex][EntryIndex], (EntryG
23 }
24 }

Putting it all together we have the following code:

1 /* Allocates page maps and create identity page table */


2 PVMM_EPT_PAGE_TABLE EptAllocateAndCreateIdentityPageTable()
3 {
4 PVMM_EPT_PAGE_TABLE PageTable;
5 EPT_PML3_POINTER RWXTemplate;
6 EPT_PML2_ENTRY PML2EntryTemplate;
7 SIZE_T EntryGroupIndex;
8 SIZE_T EntryIndex;
9  
10 // Allocate all paging structures as 4KB aligned pages
11 PHYSICAL_ADDRESS MaxSize;
12 PVOID Output;
13  
14 // Allocate address anywhere in the OS's memory space
15 MaxSize.QuadPart = MAXULONG64;
16  
17 PageTable = MmAllocateContiguousMemory((sizeof(VMM_EPT_PAGE_TABLE) / PAG
18  
19 if (PageTable == NULL)
20 {
21 LogError("Failed to allocate memory for PageTable");

https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 17/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

22 return NULL;
23 }
24  
25 // Zero out all entries to ensure all unused entries are marked Not Pres
26 RtlZeroMemory(PageTable, sizeof(VMM_EPT_PAGE_TABLE));
27  
28 // Initialize the dynamic split list which holds all dynamic page splits
29 InitializeListHead(&PageTable->DynamicSplitList);
30  
31 // Mark the first 512GB PML4 entry as present, which allows us to manage
32 PageTable->PML4[0].PageFrameNumber = (SIZE_T)VirtualAddressToPhysicalAdd
33 PageTable->PML4[0].ReadAccess = 1;
34 PageTable->PML4[0].WriteAccess = 1;
35 PageTable->PML4[0].ExecuteAccess = 1;
36  
37 /* Now mark each 1GB PML3 entry as RWX and map each to their PML2 entry
38  
39 // Ensure stack memory is cleared
40 RWXTemplate.Flags = 0;
41  
42 // Set up one 'template' RWX PML3 entry and copy it into each of the 512
43 // Using the same method as SimpleVisor for copying each entry using int
44 RWXTemplate.ReadAccess = 1;
45 RWXTemplate.WriteAccess = 1;
46 RWXTemplate.ExecuteAccess = 1;
47  
48 // Copy the template into each of the 512 PML3 entry slots
49 __stosq((SIZE_T*)&PageTable->PML3[0], RWXTemplate.Flags, VMM_EPT_PML3E_C
50  
51 // For each of the 512 PML3 entries
52 for (EntryIndex = 0; EntryIndex < VMM_EPT_PML3E_COUNT; EntryIndex++)
53 {
54 // Map the 1GB PML3 entry to 512 PML2 (2MB) entries to describe each lar
55 // NOTE: We do *not* manage any PML1 (4096 byte) entries and do not allo
56 PageTable->PML3[EntryIndex].PageFrameNumber = (SIZE_T)VirtualAddressToPh
57 }
58  
59 PML2EntryTemplate.Flags = 0;
60  
61 // All PML2 entries will be RWX and 'present'
62 PML2EntryTemplate.WriteAccess = 1;
63 PML2EntryTemplate.ReadAccess = 1;
64 PML2EntryTemplate.ExecuteAccess = 1;
65  
66 // We are using 2MB large pages, so we must mark this 1 here.
67 PML2EntryTemplate.LargePage = 1;
68  
69 /* For each collection of 512 PML2 entries (512 collections * 512 entrie
70    This marks the entries as "Present" regardless of if the actual syste
71    EPT handler if the guest access a page outside a usable range, despit
72 */
73 __stosq((SIZE_T*)&PageTable->PML2[0], PML2EntryTemplate.Flags, VMM_EPT_P
74  
75 // For each of the 512 collections of 512 2MB PML2 entries
76 for (EntryGroupIndex = 0; EntryGroupIndex < VMM_EPT_PML3E_COUNT; EntryGr
77 {
78 // For each 2MB PML2 entry in the collection
79 for (EntryIndex = 0; EntryIndex < VMM_EPT_PML2E_COUNT; EntryIndex++)
80 {
81 // Setup the memory type and frame number of the PML2 entry.
82 EptSetupPML2Entry(&PageTable->PML2[EntryGroupIndex][EntryIndex], (EntryG
83 }
84 }
85  
86 return PageTable;
87 }

Setting up PML2 entries

https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 18/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

PML2 is di erent from the other tables; this is because, in our 2 MB design, it’s the
last table, so it has to deal with MTRRs’ caching policy.

First, we have to set the PageFrameNumber of our PML2 entry. This is because
we’re mapping all 512 GB without any hole, I mean, we’re not trying to see just what
are MTRR’s base and end address and map based on them, but we map every
possible physical address within 512 GB. Think about it one more time.

If you want to know more about PFNs in Windows, then you can read my blog posts
Inside Windows Page Frame Number (PFN) – Part 1 and Part 2.

1   Each of the 512 collections of 512 PML2 entries is setup here.


2   This will, in total, identity map every physical address from 0x0 to ph
3  
4   ((EntryGroupIndex * VMM_EPT_PML2E_COUNT) + EntryIndex) * 2MB is the act
5 */
6 NewEntry->PageFrameNumber = PageFrameNumber;

Now it’s time to see the actual caching policy based on MTRRs. Ranges in MTRRs are
not divided by 4 KB or 2 MB, and these are exact physical addresses. What we are
going to do is iterating over each MTRR and see whether a special MTRR describes
our current physical address or not.

If none of them describe it, then we choose Write-Back


(MEMORY_TYPE_WRITE_BACK) as the default caching policy; otherwise, we have
to select the caching policy that is used in MTRRs.

This approach will make our EPT PML2 as it’s like a real system.

If we don’t choose the system-specific caching policy, then it will cause catastrophic
errors. For example, some of the devices that use physical memory as the command
and control mechanism go through the cache and won’t immediately respond to
our requests or for APIC devices will not work in the case of real-time interrupts.

The following code is responsible for finding the desired caching policy based on
MTRRs.

1 // Default memory type is always WB for performance.


2 TargetMemoryType = MEMORY_TYPE_WRITE_BACK;
3  
4 // For each MTRR range
5 for (CurrentMtrrRange = 0; CurrentMtrrRange < EptState->NumberOfEnabledM
6 {
7 // If this page's address is below or equal to the max physical address
8 if (AddressOfPage <= EptState->MemoryRanges[CurrentMtrrRange].PhysicalEn
9 {
10 // And this page's last address is above or equal to the base physical a
11 if ((AddressOfPage + SIZE_2_MB - 1) >= EptState->MemoryRanges[CurrentMtr
12 {
13 /* If we're here, this page fell within one of the ranges specified by t
14    Therefore, we must mark this page as the same cache type exposed by t
https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 19/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

15 */
16 TargetMemoryType = EptState->MemoryRanges[CurrentMtrrRange].MemoryType;
17 // LogInfo("0x%X> Range=%llX -> %llX | Begin=%llX End=%llX", PageFrameNu
18  
19 // 11.11.4.1 MTRR Precedences
20 if (TargetMemoryType == MEMORY_TYPE_UNCACHEABLE)
21 {
22 // If this is going to be marked uncacheable, then we stop the search as
23 break;
24 }
25 }
26 }
27 }
28  
29 // Finally, commit the memory type to the entry.
30 NewEntry->MemoryType = TargetMemoryType;

EPT Violation

Intel describes EPT Violation like this:

An EPT violation occurs when there is no EPT misconfiguration, but the EPT paging
structure entries disallow access using the guest-physical address.

But that’s hard to understand, in short, every time one instruction tries to read a
page (Read Access), or an instruction tries to write on a page (Write Access), or an
instruction causes instruction fetch from a page and EPT attributes (the one we
configured in the above sections) of that page doesn’t allow this, then an EPT
Violation occurs.

Let me explain a little bit more, imagine we have an entry in our EPT Table which is
responsible for mapping physical address 0x1000. In this entry, we set Write Access
to 0 (Read Access = 1 and Execute Access = 1). If any instruction tries to write on
that page, for example by using (Mov [0x1000], RAX) then as the paging attributes
doesn’t allow writing, so an EPT Violation occurs and now our callback is called so
that we can decide to what we want to do with that page.

By 0x1000, I mean a physical address. Of course, if you have the virtual address,
then it gets translated to a physical.

Another example, let’s assume an NT function (for example NtCreateFile) is located


f801`80230540.

1 nt!NtCreateFile:
2 fffff801`80230540 4881ec88000000  sub     rsp,88h
3 fffff801`80230547 33c0            xor     eax,eax
4 fffff801`80230549 4889442478      mov     qword ptr [rsp+78h],rax

If we convert it to a physical address, then the address of NtCreateFile in physical


memory is 0x3B8000, now we try to find this physical address in our EPT PTE Table.

https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 20/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

Then we set Execute Access of that entry to 0. Now, each time someone tries to call,
jmp, ret, etc. to this particular page, then an EPT Violation occurs.

This is the basic idea of using EPT function hooks, we talk about it in detail in Part 8.

For now, first, we have to read the physical address, which caused this EPT
Violation. It’s done by reading GUEST_PHYSICAL_ADDRESS using Vmread
instruction.

1 // Reading guest physical address


2 GuestPhysicalAddr = 0;
3 __vmx_vmread(GUEST_PHYSICAL_ADDRESS, &GuestPhysicalAddr);
4 LogInfo("Guest Physical Address : 0x%llx", GuestPhysicalAddr);

The second thing that we have to read is Exit Qualification. If you remember from
the previous part, Exit Qualification gives additional details about Exit Reasons.

I mean, each Exit Reason might have a special Exit Qualification that has a special
meaning for that special Exit Reason. (how many “special” I used in the previous
sentence ?)

Exit Reason can be read from VM_EXIT_REASON using Vmread instruction.

1 ULONG ExitReason = 0;
2 __vmx_vmread(VM_EXIT_REASON, &ExitReason);

In the case of EPT Violation, Exit Qualification shows that the reason why this
violation occurs. For example, it indicates that EPT Violation occurs because of a
data read to a physical page that its Read Access is 0 or instruction fetches (a
function tries to execute instruction) from a physical page that its Execute Access is
0.

The following table shows the structure of Exit Qualification and each bit’s
meaning for EPT Violation.

https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 21/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

Now that we have all the details, we need to pass them to


EptHandlePageHookExit, and we deal with it in the next sections.

1 /*
2    Handle VM exits for EPT violations. Violations are thrown whenever an
3    on an EPT entry that does not provide permissions to access that page.
4 */
5 BOOLEAN EptHandleEptViolation(ULONG ExitQualification, UINT64 GuestPhysic
6 {
7  
8 VMX_EXIT_QUALIFICATION_EPT_VIOLATION ViolationQualification;
9  
10 DbgBreakPoint();
11  
12 ViolationQualification.Flags = ExitQualification;
13  
14 if (EptHandlePageHookExit(ViolationQualification, GuestPhysicalAddr))

https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 22/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

15 {
16 // Handled by page hook code.
17 return TRUE;
18 }
19  
20 LogError("Unexpected EPT violation");
21 DbgBreakPoint();
22  
23 // Redo the instruction that caused the exception.
24 return FALSE;
25 }

EPT Miscon guration

Another EPT derived vm-exit is EPT Misconfiguration


(EXIT_REASON_EPT_MISCONFIG).

An EPT Misconfiguration occurs when, in the course of translating a physical guest


address, the logical processor encounters an EPT paging-structure entry that
contains an unsupported value.

If you want to know more about all the reasons why EPT Misconfiguration occurs,
you can see Intel SDM – Vol 3C Section 28.2.3.1.

Based on my experience, I encountered EPT Misconfiguration most of the time


because I clear the bit 0 of the entry (indicating that data reads are not allowed),
and bit 1 is set (reporting that data writes are permitted).

Also, EPT misconfigurations occur when an EPT paging-structure entry is configured


with settings reserved for future functionality.

It’s fatal error, let’s just break and see what we’ve done wrong !

1 VOID EptHandleMisconfiguration(UINT64 GuestAddress)


2 {
3 LogInfo("EPT Misconfiguration!");
4 LogError("A field in the EPT paging structure was invalid, Faulting guest
5  
6 DbgBreakPoint();
7 // We can't continue now.
8 // EPT misconfiguration is a fatal exception that will probably crash the
9 }

Adding EPT to VMCS

Our hypervisor starts virtualizing MMU by calling EptLogicalProcessorInitialize,


which sets a 64-bit value called EPTP. The following table shows the structure of
EPTP. If you look at part 4, we have this table in that part too, but there is a change
here, bit 7 was reserved at the time I wrote part 4, and now it has something to do
with shadow stacks.

https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 23/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

EptLogicalProcessorInitialize calls EptAllocateAndCreateIdentityPageTable to


allocate identity table (as described above).

For performance, we let the processor know it can cache the EPT (MemoryType to
MEMORY_TYPE_WRITE_BACK).

We are not utilizing the ‘access‘ and ‘dirty‘ flag features


(EnableAccessAndDirtyFlags to FALSE).

As Intel mentioned, Page Walk should be the count of the tables we use (4) minus 1,
so PageWalkLength = 3 indicates an EPT page-walk length of 4. It is because we’re
not using just three tables with 2 MB granularity, we’ll split 2 MB pages to 4 KB
granularity.

The last step is to save EPTP somewhere into a global variable so we can use it later.

1 /*
2   Initialize EPT for an individual logical processor.
3   Creates an identity mapped page table and sets up an EPTP to be applied
4 */
5 BOOLEAN EptLogicalProcessorInitialize()
6 {
7 PVMM_EPT_PAGE_TABLE PageTable;
8 EPTP EPTP;
9  
10 /* Allocate the identity mapped page table*/
11 PageTable = EptAllocateAndCreateIdentityPageTable();
12 if (!PageTable)
13 {
14 LogError("Unable to allocate memory for EPT");
15 return FALSE;
16 }
17  
18 // Virtual address to the page table to keep track of it for later freei
19 EptState->EptPageTable = PageTable;
20  
21 EPTP.Flags = 0;
22  
23 // For performance, we let the processor know it can cache the EPT.
24 EPTP.MemoryType = MEMORY_TYPE_WRITE_BACK;
25  

https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 24/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

26 // We are not utilizing the 'access' and 'dirty' flag features.


27 EPTP.EnableAccessAndDirtyFlags = FALSE;
28  
29 /*
30   Bits 5:3 (1 less than the EPT page-walk length) must be 3, indicating
31   see Section 28.2.2
32 */
33 EPTP.PageWalkLength = 3;
34  
35 // The physical page number of the page table we will be using
36 EPTP.PageFrameNumber = (SIZE_T)VirtualAddressToPhysicalAddress(&PageTabl
37  
38 // We will write the EPTP to the VMCS later
39 EptState->EptPointer = EPTP;
40  
41 return TRUE;
42 }

Finally, we need to configure Vmcs with our EPTP Table, so we use vmwrite with
EPT_POINTER and set it to our EPTP.

1 // Set up EPT
2 __vmx_vmwrite(EPT_POINTER, EptState->EptPointer.Flags);

Also, don’t forget to enable EPT feature in Secondary Processor-Based VM-Execution


Controls using CPU_BASED_CTL2_ENABLE_EPT; otherwise, it won’t work.

1 SecondaryProcBasedVmExecControls = HvAdjustControls(CPU_BASED_CTL2_RDTSCP
2 CPU_BASED_CTL2_ENABLE_EPT | CPU_BASED_CTL2_ENABLE_INVPCID |
3 CPU_BASED_CTL2_ENABLE_XSAVE_XRSTORS, MSR_IA32_VMX_PROCBASED_CTLS2);
4  
5 __vmx_vmwrite(SECONDARY_VM_EXEC_CONTROL, SecondaryProcBasedVmExecControls
6 LogInfo("Secondary Proc Based VM Exec Controls (MSR_IA32_VMX_PROCBASED_CT

Now we have a perfect EPT Table which virtualizes MMU and now all of the
translations go through the EPT.

Monitoring Page’s RWX Activity

The next important topic is the monitoring of the page’s RWX. From the above
section, you saw that we put each of the Read Access, Write Access and Execute
Access to 1, but to use EPT’s monitoring features, we have to set some of them to 0
so that we get EPT Violation on each of the accesses mentioned above.

Using these features (setting access to 0) has its di iculties by its nature, problems
relating to IRQL, splitting, absence of the ability to use NT functions,
synchronization, and deadlock are some of these problems and limitations.

In this section we’re trying to solve these problem.

Pre-allocating Buffers for VMX Root Mode

A er executing VMLAUNCH, we shouldn’t modify EPT Tables from Vmx non-root


mode; that is because if we do it, then it might (and will) causes system
https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 25/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

inconsistency.

This limitation and the fact that we couldn’t use any NT function in VMX Root Mode
bring us new challenges.

One of these challenges is that we might need to split a 2 MB Page into 4 KB pages,
of course, another Page Table (PML1) is necessary to store the details of new 4 KB
pages. I mean, we have to create another Page Table (PML1), and it needs a new
memory.

We can’t use ExAllocatePoolTag in Vmx root-mode as it’s an NT API. (you can use it
in Vmx root-mode, and you’ll see that it sometimes work and sometimes halts the
system – the reason is described in the Discussion section).

The solution to this problem is using a previously allocated bu er from Vmx non-
root mode and use it in Vmx root mode, so this brings us the first limitation to our
hypervisor which is we have to start setting hooks from vmx non-root mode
because we want to pre-allocate a bu er then we pass the bu er and hook settings
to Vmx root-mode using a special Vmcalls.

By the way, this is not an unsolvable limitation, for example, you can allocate 100
pages from Vmx non-root mode and use them whenever you want in Vmx root-
mode, and it’s not necessarily a limitation anymore but for now, let’s assume that
the caller should start setting hooks from Vmx non-root mode.

Honestly, I wanted to make a mechanism for running code from Vmx root mode to
Vmx non-root mode using NMI events; using this approach will resolve the problem
of pre-allocating bu ers, but for this part, let’s use pre-allocated bu ers.

Hyperplatform and Hvpp use the pre-allocated bu ers.

In this section and next sections we’re trying to complete a function called
“EptPageHook“.

There is a per-core global variable called “PreAllocatedMemoryDetails” in


GuestState that is defined like this:

1 typedef struct _VMX_NON_ROOT_MODE_MEMORY_ALLOCATOR


2 {
3 PVOID PreAllocatedBuffer; // As we can't use ExAllocatePoolWithTag in VMX
4 // PreAllocatedBuffer == 0 indicates that it's not previously allocated
5 } VMX_NON_ROOT_MODE_MEMORY_ALLOCATOR, * PVMX_NON_ROOT_MODE_MEMORY_ALLOCATO
6  

Now that we’re trying to hook, we’ll see whether the current core has a previously
pre-allocated bu er or not. If it doesn’t have a bu er, then we allocate it using
https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 26/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

ExAllocatePoolWithTag.

1 if (GuestState[LogicalCoreIndex].PreAllocatedMemoryDetails.PreAllocatedB
2 {
3 PreAllocBuff = ExAllocatePoolWithTag(NonPagedPool, sizeof(VMM_EPT_DYNAMI
4  
5 if (!PreAllocBuff)
6 {
7 LogError("Insufficient memory for pre-allocated buffer");
8 return FALSE;
9 }
10  
11 // Zero out the memory
12 RtlZeroMemory(PreAllocBuff, sizeof(VMM_EPT_DYNAMIC_SPLIT));
13  
14 // Save the pre-allocated buffer
15 GuestState[LogicalCoreIndex].PreAllocatedMemoryDetails.PreAllocatedBuffe
16 }
17  

Now we have two di erent states if we previously configured the VMCS with EPT
and we’re already in a hypervisor then we have to ask, Vmx root-mode to set the
hook for us (Setting hook a er Vmlaunch); otherwise, we can modify it in a regular
function as we don’t execute VMLAUNCH (with EPT) yet (Setting hook before
Vmlaunch).

By “with EPT,” I mean if we used this EPT in our hypervisor. For example, you might
configure VMCS without EPTP, then you execute VMLAUNCH, and now you decide to
create an EPT Table, this way doesn’t need Vmx root-mode to modify EPT Table, we
can change it from Vmx non-root mode as we didn’t use this EPT Table yet.

Setting hook before Vmlaunch

I prefer to do everything in a function so that EptVmxRootModePageHook can be


used for both Vmx root-mode and non-root mode. Still, you shouldn’t directly call
this function as it needs a preparing phase (instead, you can call EptPageHook).

What we have to do is calling EptVmxRootModePageHook and a HasLaunched


flag that determines whether we used our EPT in our Vmx operation our not.

1 if (EptVmxRootModePageHook(TargetFunc, HasLaunched) == TRUE) {


2 LogInfo("[*] Hook applied (VM has not launched)");
3 return TRUE;
4 }

I’ll describe EptVmxRootModePageHook in the section, Applying the Hook later.

Setting hook after Vmlaunch

If we’re already used this EPT in our Vmx operation, then we need to ask Vmx root-
mode to modify the EPT Table for us; in other words, we have to call
EptVmxRootModePageHook from Vmx root-mode, so it needs Vmcall.
https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 27/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

We have some additional things to do here, as I told you, each logical core has its
own set of caches relating to EPT, so we have to invalidate all the cores’ EPT Tables
immediately and of course it has to be done in Vmx non-root mode as we want to
use NT APIs.

To call EptVmxRootModePageHook from Vmx root-mode, we’ll use Vmcall with


VMCALL_EXEC_HOOK_PAGE and also sent the functions virtual address
(TargetFunc) as the first parameter.

1 if (HasLaunched)
2 {
3 if (AsmVmxVmcall(VMCALL_EXEC_HOOK_PAGE, TargetFunc, NULL, NULL, NULL) ==
4 {
5 LogInfo("Hook applied from VMX Root Mode");
6  
7 // Now we have to notify all the core to invalidate their EPT
8 HvNotifyAllToInvalidateEpt();
9  
10 return TRUE;
11 }
12 }

In Vmcall handler, we just call EptVmxRootModePageHook.

1 case VMCALL_EXEC_HOOK_PAGE:
2 {
3 HookResult = EptVmxRootModePageHook(OptionalParam1, TRUE);
4  
5 if (HookResult)
6 {
7 VmcallStatus = STATUS_SUCCESS;
8 }
9 else
10 {
11 VmcallStatus = STATUS_UNSUCCESSFUL;
12 }
13 break;
14 }

Let’s get down to invalidation part,

HvNotifyAllToInvalidateEpt uses KeIpiGenericCall which broadcasts


HvInvalidateEptByVmcall on all the core.

1 /* Notify all core to invalidate their EPT */


2 VOID HvNotifyAllToInvalidateEpt()
3 {
4 // Let's notify them all
5 KeIpiGenericCall(HvInvalidateEptByVmcall, EptState->EptPointer.Flags);
6 }

As the invalidation should be within vmx root-mode (INVEPT instruction is only


valid in vmx root-mode) thus HvInvalidateEptByVmcall uses Vmcall with
VMCALL_INVEPT_ALL_CONTEXT and VMCALL_INVEPT_SINGLE_CONTEXT to
notify vmx root-mode about invalidation.

https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 28/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

1 /* Invalidate EPT using Vmcall (should be called from Vmx non root mode)
2 VOID HvInvalidateEptByVmcall(UINT64 Context)
3 {
4 if (Context == NULL)
5 {
6 // We have to invalidate all contexts
7 AsmVmxVmcall(VMCALL_INVEPT_ALL_CONTEXT, NULL, NULL, NULL);
8 }
9 else
10 {
11 // We have to invalidate all contexts
12 AsmVmxVmcall(VMCALL_INVEPT_SINGLE_CONTEXT, Context, NULL, NULL);
13 }
14 }

The Vmcall handler uses InveptSingleContext and InveptAllContexts to invalidate


the contexts; we’ll talk about invalidation in details later in this part (Invalidating
Translations Derived from EPT (INVEPT)).

1 case VMCALL_INVEPT_SINGLE_CONTEXT:
2 {
3 InveptSingleContext(OptionalParam1);
4 VmcallStatus = STATUS_SUCCESS;
5 break;
6 }
7 case VMCALL_INVEPT_ALL_CONTEXT:
8 {
9 InveptAllContexts();
10 VmcallStatus = STATUS_SUCCESS;
11 break;
12 }

Finding a Page’s entry in EPT Tables

Let’s see how we can find addresses in PML1, PML2, PML3 and PML4.

Finding PML4, PML3, PML2 entries

We want to find PML2 entry, for finding PML2, first, we have to find PML4 and PML3.

We used an ordinal approach to map the physical addresses so all the physical
addresses are stored in the same way so we need some definitions to find the index
of the entries from tables.

Here’s the definitions.

1 // Index of the 1st paging structure (4096 byte)


2 #define ADDRMASK_EPT_PML1_INDEX(_VAR_) ((_VAR_ & 0x1FF000ULL) >> 12)
3  
4 // Index of the 2nd paging structure (2MB)
5 #define ADDRMASK_EPT_PML2_INDEX(_VAR_) ((_VAR_ & 0x3FE00000ULL) >> 21)
6  
7 // Index of the 3rd paging structure (1GB)
8 #define ADDRMASK_EPT_PML3_INDEX(_VAR_) ((_VAR_ & 0x7FC0000000ULL) >> 30)
9  
10 // Index of the 4th paging structure (512GB)
11 #define ADDRMASK_EPT_PML4_INDEX(_VAR_) ((_VAR_ & 0xFF8000000000ULL) >> 39

https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 29/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

A er finding the indexes, we have to find the virtual address to that index so we can
modify the page table. It’s because in protected mode we can’t access physical
addresses.

The following code, first finds the indexes then return the virtual address from the
EPT Page Table to that indexes.

1 /* Get the PML2 entry for this physical address. */


2 PEPT_PML2_ENTRY EptGetPml2Entry(PVMM_EPT_PAGE_TABLE EptPageTable, SIZE_T
3 {
4 SIZE_T Directory, DirectoryPointer, PML4Entry;
5 PEPT_PML2_ENTRY PML2;
6  
7 Directory = ADDRMASK_EPT_PML2_INDEX(PhysicalAddress);
8 DirectoryPointer = ADDRMASK_EPT_PML3_INDEX(PhysicalAddress);
9 PML4Entry = ADDRMASK_EPT_PML4_INDEX(PhysicalAddress);
10  
11 // Addresses above 512GB are invalid because it is > physical address bu
12 if (PML4Entry > 0)
13 {
14 return NULL;
15 }
16  
17 PML2 = &EptPageTable->PML2[DirectoryPointer][Directory];
18 return PML2;
19 }

Finding PML1 entry

For PML1, we have the same approach. First, we find the PML2 the same as above.
Then we check to see if the PML2 is split or not. It’s because if it’s not split before
then we don’t have PML1 and it’s 3-level paging.

Finally, as we saved physical addresses contiguously, so we can find the index using
ADDRMASK_EPT_PML1_INDEX (as defined above) and then return the virtual
address to that page entry.

1 /* Get the PML1 entry for this physical address if the page is split. Ret
2 PEPT_PML1_ENTRY EptGetPml1Entry(PVMM_EPT_PAGE_TABLE EptPageTable, SIZE_T
3 {
4 SIZE_T Directory, DirectoryPointer, PML4Entry;
5 PEPT_PML2_ENTRY PML2;
6 PEPT_PML1_ENTRY PML1;
7 PEPT_PML2_POINTER PML2Pointer;
8  
9 Directory = ADDRMASK_EPT_PML2_INDEX(PhysicalAddress);
10 DirectoryPointer = ADDRMASK_EPT_PML3_INDEX(PhysicalAddress);
11 PML4Entry = ADDRMASK_EPT_PML4_INDEX(PhysicalAddress);
12  
13 // Addresses above 512GB are invalid because it is > physical address bu
14 if (PML4Entry > 0)
15 {
16 return NULL;
17 }
18  
19 PML2 = &EptPageTable->PML2[DirectoryPointer][Directory];
20  
21 // Check to ensure the page is split
22 if (PML2->LargePage)

https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 30/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

23 {
24 return NULL;
25 }
26  
27 // Conversion to get the right PageFrameNumber.
28 // These pointers occupy the same place in the table and are directly co
29 PML2Pointer = (PEPT_PML2_POINTER)PML2;
30  
31 // If it is, translate to the PML1 pointer
32 PML1 = (PEPT_PML1_ENTRY)PhysicalAddressToVirtualAddress((PVOID)(PML2Poin
33  
34 if (!PML1)
35 {
36 return NULL;
37 }
38  
39 // Index into PML1 for that address
40 PML1 = &PML1[ADDRMASK_EPT_PML1_INDEX(PhysicalAddress)];
41  
42 return PML1;
43 }

Splitting 2 MB Pages to 4 KB Pages

As you know, in all of our hypervisor parts we used 3 LEVEL paging (PML4, PML3,
PML2) and our granularity is 2 MB. Having pages with 2 MB granularity is not
adequate for monitoring purposes because we might get lots of unrelated violations
caused by non-relevant areas.

To fix these kind of problems, we use PML1 and 4 KB granularity.

This is where we might need an additional bu er and as we’re in vmx root-mode,


then we’ll use our previously allocated bu ers.

First, we get the actual entry from PML2 and check if it’s already a 4 KB defined
table, if it previously split then nothing to do, we can use it.

1 // Find the PML2 entry that's currently used


2 TargetEntry = EptGetPml2Entry(EptPageTable, PhysicalAddress);
3 if (!TargetEntry)
4 {
5 LogError("An invalid physical address passed");
6 return FALSE;
7 }
8  
9 // If this large page is not marked a large page, that means it's a poin
10 // That page is therefore already split.
11 if (!TargetEntry->LargePage)
12 {
13 return TRUE;
14 }

If not, we set PreAllocatedMemoryDetails‘s PreAllocatedBu er to null so that


next time the pre-allocator allocates a new bu er for this purpose.

1 // Free previous buffer


2 GuestState[CoreIndex].PreAllocatedMemoryDetails.PreAllocatedBuffer = NULL

https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 31/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

Then, we should fill the PML1 with an RWX template and then split our 2 MB page
into 4 KB chunks (compute 4 KB physical addresses and fill the
PageFrameNumber).

1 // Point back to the entry in the dynamic split for easy reference for w
2 NewSplit->Entry = TargetEntry;
3  
4 // Make a template for RWX
5 EntryTemplate.Flags = 0;
6 EntryTemplate.ReadAccess = 1;
7 EntryTemplate.WriteAccess = 1;
8 EntryTemplate.ExecuteAccess = 1;
9  
10 // Copy the template into all the PML1 entries
11 __stosq((SIZE_T*)&NewSplit->PML1[0], EntryTemplate.Flags, VMM_EPT_PML1E_
12  
13  
14 // Set the page frame numbers for identity mapping.
15 for (EntryIndex = 0; EntryIndex < VMM_EPT_PML1E_COUNT; EntryIndex++)
16 {
17 // Convert the 2MB page frame number to the 4096 page entry number plus
18 NewSplit->PML1[EntryIndex].PageFrameNumber = ((TargetEntry->PageFrameNum
19 }

Finally, create a new PML2 entry (with LargePage = 0) and replace it with the
previous PML2 entry.

Also keep the track of allocated memory to de-allocate it when we want to run
vmxo .

1 // Allocate a new pointer which will replace the 2MB entry with a pointe
2 NewPointer.Flags = 0;
3 NewPointer.WriteAccess = 1;
4 NewPointer.ReadAccess = 1;
5 NewPointer.ExecuteAccess = 1;
6 NewPointer.PageFrameNumber = (SIZE_T)VirtualAddressToPhysicalAddress(&Ne
7  
8 // Add our allocation to the linked list of dynamic splits for later dea
9 InsertHeadList(&EptPageTable->DynamicSplitList, &NewSplit->DynamicSplitL
10  
11 // Now, replace the entry in the page table with our new split pointer.
12 RtlCopyMemory(TargetEntry, &NewPointer, sizeof(NewPointer));

The following function represent the full code for splitting 2 MB pages to 4 KB pages.

1 /* Split 2MB (LargePage) into 4kb pages */


2 BOOLEAN EptSplitLargePage(PVMM_EPT_PAGE_TABLE EptPageTable, PVOID PreAllo
3 {
4  
5 PVMM_EPT_DYNAMIC_SPLIT NewSplit;
6 EPT_PML1_ENTRY EntryTemplate;
7 SIZE_T EntryIndex;
8 PEPT_PML2_ENTRY TargetEntry;
9 EPT_PML2_POINTER NewPointer;
10  
11 // Find the PML2 entry that's currently used
12 TargetEntry = EptGetPml2Entry(EptPageTable, PhysicalAddress);
13 if (!TargetEntry)
14 {
15 LogError("An invalid physical address passed");
16 return FALSE;

https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 32/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

17 }
18  
19 // If this large page is not marked a large page, that means it's a poin
20 // That page is therefore already split.
21 if (!TargetEntry->LargePage)
22 {
23 return TRUE;
24 }
25  
26 // Free previous buffer
27 GuestState[CoreIndex].PreAllocatedMemoryDetails.PreAllocatedBuffer = NUL
28  
29 // Allocate the PML1 entries
30 NewSplit = (PVMM_EPT_DYNAMIC_SPLIT)PreAllocatedBuffer;
31 if (!NewSplit)
32 {
33 LogError("Failed to allocate dynamic split memory");
34 return FALSE;
35 }
36 RtlZeroMemory(NewSplit, sizeof(VMM_EPT_DYNAMIC_SPLIT));
37  
38  
39 // Point back to the entry in the dynamic split for easy reference for w
40 NewSplit->Entry = TargetEntry;
41  
42 // Make a template for RWX
43 EntryTemplate.Flags = 0;
44 EntryTemplate.ReadAccess = 1;
45 EntryTemplate.WriteAccess = 1;
46 EntryTemplate.ExecuteAccess = 1;
47  
48 // Copy the template into all the PML1 entries
49 __stosq((SIZE_T*)&NewSplit->PML1[0], EntryTemplate.Flags, VMM_EPT_PML1E_
50  
51  
52 // Set the page frame numbers for identity mapping.
53 for (EntryIndex = 0; EntryIndex < VMM_EPT_PML1E_COUNT; EntryIndex++)
54 {
55 // Convert the 2MB page frame number to the 4096 page entry number plus
56 NewSplit->PML1[EntryIndex].PageFrameNumber = ((TargetEntry->PageFrameNum
57 }
58  
59 // Allocate a new pointer which will replace the 2MB entry with a pointe
60 NewPointer.Flags = 0;
61 NewPointer.WriteAccess = 1;
62 NewPointer.ReadAccess = 1;
63 NewPointer.ExecuteAccess = 1;
64 NewPointer.PageFrameNumber = (SIZE_T)VirtualAddressToPhysicalAddress(&Ne
65  
66 // Add our allocation to the linked list of dynamic splits for later dea
67 InsertHeadList(&EptPageTable->DynamicSplitList, &NewSplit->DynamicSplitL
68  
69 // Now, replace the entry in the page table with our new split pointer.
70 RtlCopyMemory(TargetEntry, &NewPointer, sizeof(NewPointer));
71  
72 return TRUE;
73 }
74  

Applying the Hook

EptVmxRootModePageHook is one of the important parts of the EPT.

First, we check to prohibit calling this function from vmx root-mode when the pre-
allocated bu er isn’t available.

https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 33/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

1 // Check whether we are in VMX Root Mode or Not


2 LogicalCoreIndex = KeGetCurrentProcessorIndex();
3  
4 if (GuestState[LogicalCoreIndex].IsOnVmxRootMode && GuestState[LogicalCor
5 {
6 return FALSE;
7 }

Then we align the address as the addresses in page tables are aligned.

1 VirtualTarget = PAGE_ALIGN(TargetFunc);
2  
3 PhysicalAddress = (SIZE_T)VirtualAddressToPhysicalAddress(VirtualTarget);

We’ll check about the granularity and split it if it’s a LargePage (more details at the
next section – Splitting 2 MB Pages to 4 KB Pages ).

1 // Set target buffer


2 TargetBuffer = GuestState[LogicalCoreIndex].PreAllocatedMemoryDetails.Pre
3  
4  
5 if (!EptSplitLargePage(EptState->EptPageTable, TargetBuffer, PhysicalAddr
6 {
7 LogError("Could not split page for the address : 0x%llx", PhysicalAddress
8 return FALSE;
9 }

Then find the PML1 entry of the requested page and as it’s already divided into 4 KB
pages so PML1 is available.

1 // Pointer to the page entry in the page table.


2 TargetPage = EptGetPml1Entry(EptState->EptPageTable, PhysicalAddress);
3  
4 // Ensure the target is valid.
5 if (!TargetPage)
6 {
7 LogError("Failed to get PML1 entry of the target address");
8 return FALSE;
9 }
10  
11 // Save the original permissions of the page
12 OriginalEntry = *TargetPage;

Now, we change the attributes related to the PML1 entry, this the most interesting
part of this function, for example, you can disable Write access to a 4 KB page, in our
case, I disabled instruction execution (fetch) from the target page.

1 /*
2 * Lastly, mark the entry in the table as no execute. This will cause the
3 * fetched from this page to cause an EPT violation exit. This will allow
4 * hook.
5 */
6 OriginalEntry.ReadAccess = 1;
7 OriginalEntry.WriteAccess = 1;
8 OriginalEntry.ExecuteAccess = 0;
9  
10  
11 // Apply the hook to EPT
12 TargetPage->Flags = OriginalEntry.Flags;

https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 34/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

If we are in vmx root-mode then the TLB caches have to be invalidated.

1 // Invalidate the entry in the TLB caches so it will not conflict with th
2 if (HasLaunched)
3 {
4 // Uncomment in order to invalidate all the contexts
5 // LogInfo("INVEPT Results : 0x%x\n", InveptAllContexts());
6 Descriptor.EptPointer = EptState->EptPointer.Flags;
7 Descriptor.Reserved = 0;
8 AsmInvept(1, &Descriptor);
9 }

Done ! The hook is applied.

Handling hooked pages’ vm-exits

First, we’re trying to align the Guest Physical Address (remember from the Ept
Violation that we read the GUEST_PHYSICAL_ADDRESS from Vmcs). This because
we’re only able to find aligned physical addresses from our EPT Table (we don’t
want to iterate over them !).

1 PhysicalAddress = PAGE_ALIGN(GuestPhysicalAddr);

Now, as I described above, we find the PML1 entry relating to this physical address.
We’re not looking for PML2 that’s because, if we reached here then we probably
split 2 MB pages to 4 KB pages and we have PML1 instead of PML2.

1 TargetPage = EptGetPml1Entry(EptState->EptPageTable, PhysicalAddress);


2  
3 // Ensure the target is valid.
4 if (!TargetPage)
5 {
6 LogError("Failed to get PML1 entry for target address");
7 return FALSE;
8 }

Finally, we check if the violation is caused by an Execute Access (based on Exit


Qualification) and the violated page has Execute Access to 0, if so, then just make
the page’s entry in PML1 executable and invalidate the cache so that this
modification takes e ect.

Don’t forget to tell our vm-exit handler to avoid skipping the current instruction
(avoid adding Instruction Length to Guest RIP) and execute it one more time as the
instruction didn’t execute.

1 // If the violation was due to trying to execute a non-executable page,


2 // swapped in page is our original RW page. We need to swap in the hooke
3 if (!ViolationQualification.EptExecutable && ViolationQualification.Exec
4 {
5  
6 TargetPage->ExecuteAccess = 1;
7  
8 // InveptAllContexts();
9 INVEPT_DESCRIPTOR Descriptor;

https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 35/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

10  
11 Descriptor.EptPointer = EptState->EptPointer.Flags;
12 Descriptor.Reserved = 0;
13 AsmInvept(1, &Descriptor);
14  
15 // Redo the instruction
16 GuestState[KeGetCurrentProcessorNumber()].IncrementRip = FALSE;
17  
18 LogInfo("Set the Execute Access of a page (PFN = 0x%llx) to 1", TargetPa
19  
20 return TRUE;
21 }

All in all, we have the following handler.

1 /* Check if this exit is due to a violation caused by a currently hooked


2 * if the violation was not due to a page hook.
3 *
4 * If the memory access attempt was RW and the page was marked executable
5 * the original page.
6 *
7 * If the memory access attempt was execute and the page was marked not e
8 * the hooked page.
9 */
10 BOOLEAN EptHandlePageHookExit(VMX_EXIT_QUALIFICATION_EPT_VIOLATION Violat
11 {
12 SIZE_T PhysicalAddress;
13 PVOID VirtualTarget;
14  
15 PEPT_PML1_ENTRY TargetPage;
16  
17  
18 /* Translate the page from a physical address to virtual so we can read
19    This function will return NULL if the physical address was not alread
20    virtual memory.
21 */
22 PhysicalAddress = PAGE_ALIGN(GuestPhysicalAddr);
23  
24 if (!PhysicalAddress)
25 {
26 LogError("Target address could not be mapped to physical memory");
27 return FALSE;
28 }
29  
30 TargetPage = EptGetPml1Entry(EptState->EptPageTable, PhysicalAddress);
31  
32 // Ensure the target is valid.
33 if (!TargetPage)
34 {
35 LogError("Failed to get PML1 entry for target address");
36 return FALSE;
37 }
38  
39 // If the violation was due to trying to execute a non-executable page,
40 // swapped in page is our original RW page. We need to swap in the hooke
41 if (!ViolationQualification.EptExecutable && ViolationQualification.Exec
42 {
43  
44 TargetPage->ExecuteAccess = 1;
45  
46 // InveptAllContexts();
47 INVEPT_DESCRIPTOR Descriptor;
48  
49 Descriptor.EptPointer = EptState->EptPointer.Flags;
50 Descriptor.Reserved = 0;
51 AsmInvept(1, &Descriptor);
52  
53 // Redo the instruction
https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 36/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

54 GuestState[KeGetCurrentProcessorNumber()].IncrementRip = FALSE;
55  
56 LogInfo("Set the Execute Access of a page (PFN = 0x%llx) to 1", TargetPa
57  
58 return TRUE;
59 }
60  
61 LogError("Invalid page swapping logic in hooked page");
62  
63 return FALSE;
64 }

Invalidating Translations Derived from EPT (INVEPT)

Now that we implemented EPT, there is another problem here. It’s the so ware’s
responsibility to invalidate the caches. For example, we changed the Execute
access attribute of a particular page, now we have to tell the CPU that we changed
something and it has to invalidate its cache, or in another way, we get EPT Violation
for Execute access of a special page and now we no longer need these EPT
Violations for this page. Hence, we set the Execute Access of this page to 1; thus, we
have to tell our processor that we changed something in our page table. Are you
confused? Let me explain it one more time.

Imagine we access the physical 0x1000, and it’ll get translated to host physical
address 0x1000 (based on 1:1 mapping). Next time, if we access 0x1000, the CPU
won’t send the request to the memory bus but uses cached memory instead. It’s
faster. Now let’s say we changed the EPT Physical Address of a page to point to
di erent EPT PD or change the attributes (Read, Write, Execute) of one of the EPT
tables, now we have to tell the processor that your cache is invalid and that’s what
exactly INVEPT performs.

There is a problem here; we have to separately tell each logical core that it needs to
invalidate its EPT cache. In other words, each core has to execute INVEPT on its vmx
root-mode. We’ll solve these problems later in this part.

There are two types of TLB Invalidation for hypervisors.

VMX-specific TLB-management instructions:

INVEPT – Invalidate cached Extended Page Table (EPT) mappings in


the processor to synchronize address translation in virtual machines
with memory-resident EPT pages.

INVVPID – Invalidate cached mappings of address translation based


on the Virtual Processor ID (VPID).

We’ll talk about INVVPID in detail in part 8.

https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 37/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

So in case if you wouldn’t perform INVEPT a er changing EPT’s structures, you


would be risking that the CPU would reuse old translations.

Any change to EPT structure needs INVEPT, but switching EPT (or VMCS) doesn’t
require INVEPT because that translation will be “tagged” with the changed EPTP in
the cache.

Now we have two terms here, Single-Context and All-Context.

1 typedef enum _INVEPT_TYPE


2 {
3 SINGLE_CONTEXT = 0x00000001,
4 ALL_CONTEXTS = 0x00000002
5 };

And we have a assembly function which generally executes the INVEPT.

1 ; Error codes :
2     VMX_ERROR_CODE_SUCCESS              = 0
3     VMX_ERROR_CODE_FAILED_WITH_STATUS   = 1
4     VMX_ERROR_CODE_FAILED               = 2
5  
6 AsmInvept PROC PUBLIC
7  
8     invept  rcx, oword ptr [rdx]
9     jz @jz
10     jc @jc
11     xor     rax, rax
12     ret
13  
14     @jz:
15     mov     rax, VMX_ERROR_CODE_FAILED_WITH_STATUS
16     ret
17  
18     @jc:
19     mov     rax, VMX_ERROR_CODE_FAILED
20     ret
21  
22 AsmInvept ENDP

From the above code, RCX describes the Type (which can be one of the all-context
and single-context), and RDX is the descriptor for INVEPT.

The following structure is the descriptor for INVEPT as described in Intel SDM.

1 typedef struct _INVEPT_DESC


2 {
3 EPTP EptPointer;
4 UINT64  Reserveds;
5 }INVEPT_DESC, * PINVEPT_DESC;

https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 38/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

We’ll use our assembly function in another function called Invept.

1 /* Invoke the Invept instruction */


2 unsigned char Invept(UINT32 Type, INVEPT_DESC* Descriptor)
3 {
4 if (!Descriptor)
5 {
6 INVEPT_DESC ZeroDescriptor = { 0 };
7 Descriptor = &ZeroDescriptor;
8 }
9  
10 return AsmInvept(Type, Descriptor);
11 }

It’s time to see what are so called “All-Context“and “Single-Context“.

Invalidating All Contexts

All-Context means that you invalidate all EPT-derived translations. (for every-VM).

1 /* Invalidates all contexts in ept cache table */


2 unsigned char InveptAllContexts()
3 {
4 return Invept(ALL_CONTEXTS, NULL);
5 }

Note: For every-VM, I mean every VM for a particular logical core; each core can have
multiple VMCSs and EPT tables and switches between them. It doesn’t relate to the
EPT table on other cores.

Invalidating Single Context

Single-Context means that you invalidate all EPT-derived translations based on a


single EPTP (in short: for a single VM in a logical core).

1 /* Invalidates a single context in ept cache table */


2 unsigned char InveptSingleContext(UINT64 EptPointer)
3 {
4 INVEPT_DESC Descriptor = { EptPointer, 0 };
5 return Invept(SINGLE_CONTEXT, &Descriptor);
6 }

Broadcasting Invept to all logical cores simultaneously

Let say you have two cores and 1 EPTP. At some point you change EPT on core one;
thus you have to invalidate EPT on all cores at that point. If you remember from the
https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 39/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

previous section, we have to notify all cores to invalidate their EPT caches using
something like KeIpiGenericCall, and the problem is you can’t call
KeIpiGenericCall from VM-exit for apparent reasons – you shouldn’t call any NT
APIs in Vm-exit. Calling this API from Vm-exit likely causes deadlock.

We can get around this by modifying APIC and creating our custom IPI call routine.
We’ll come across APIC virtualization in the future parts. Still, for now, if we want to
change EPT for all cores, then we can call KeIpiGenericCall from regular kernel-
mode (not vmx root-mode) and in that callback we perform Vmcall to tell our
processor to invalidate its cache in vmx root-mode.

It’s because if we don’t immediately invalidate EPT, then we might lose some EPT
Violations. This is because each logical core will have a di erent memory view.

If you remember from the above sections (EptPageHook), we’d checked whether
the core is already on vmx operation (vmlaunch is executed). If it launched, then we
used Vmcall to tell the processor about modifying EPT Table from the vmx root-
mode. Right a er returning from Vmcall, we called HvNotifyAllToInvalidateEpt to
tell all the cores about new invalidation in their EPT caches (remember, we’re not on
vmx root-mode anymore, we’re in vmx non-root mode so we can use NT APIs as it’s
a regular kernel function).

1 if (HasLaunched)
2 {
3 if (AsmVmxVmcall(VMCALL_EXEC_HOOK_PAGE, TargetFunc, NULL, NULL, NULL) ==
4 {
5 LogInfo("Hook applied from VMX Root Mode");
6  
7 // Now we have to notify all the core to invalidate their EPT
8 HvNotifyAllToInvalidateEpt();
9  
10 return TRUE;
11 }
12 }

HvNotifyAllToInvalidateEpt, on the other hand, uses KeIpiGenericCall, and this


function broadcasts HvInvalidateEptByVmcall on all the logical cores and also
pass our current EPTP to this function.

1 /* Notify all core to invalidate their EPT */


2 VOID HvNotifyAllToInvalidateEpt()
3 {
4 // Let's notify them all
5 KeIpiGenericCall(HvInvalidateEptByVmcall, EptState->EptPointer.Flags);
6 }

HvInvalidateEptByVmcall decides whether the caller needs an all-contexts


invalidation or a single-context invalidation, and based on that, it calls the Vmcall
with adequate Vmcall number. Note that our hypervisor doesn’t have multiple
EPTPs, so it’s always a single-context Vmcall.

https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 40/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

1 /* Invalidate EPT using Vmcall (should be called from Vmx non root mode)
2 VOID HvInvalidateEptByVmcall(UINT64 Context)
3 {
4 if (Context == NULL)
5 {
6 // We have to invalidate all contexts
7 AsmVmxVmcall(VMCALL_INVEPT_ALL_CONTEXT, NULL, NULL, NULL);
8 }
9 else
10 {
11 // We have to invalidate all contexts
12 AsmVmxVmcall(VMCALL_INVEPT_SINGLE_CONTEXT, Context, NULL, NULL);
13 }
14 }

Finally, Vmcall handler calls InveptAllContexts or HvInvalidateEptByVmcall based


on Vmcall number in vmx root-mode.

1 case VMCALL_INVEPT_SINGLE_CONTEXT:
2 {
3 c(OptionalParam1);
4 VmcallStatus = STATUS_SUCCESS;
5 break;
6 }
7 case VMCALL_INVEPT_ALL_CONTEXT:
8 {
9 InveptAllContexts();
10 VmcallStatus = STATUS_SUCCESS;
11 break;
12 }

The last thing is you can’t execute INVEPT in vmx non-root mode as it causes a Vm-
exit with EXIT_REASON_INVEPT (0x32) and it doesn’t have any e ect.

That’s it all for INVEPT.

Fixing Previous Design Issues

https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 41/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

The rest of the topic is nothing new. We want to improve our hypervisor and fix
some issues from the previous parts and also support some new features and
defeat some deadlocks and synchronization problems that exist in our previous
parts.

Support to more than 64 logical cores

Previous versions of Hypervisor From Scratch has the problem of not supporting
more than 32 cores (32*2 logical cores). This is because we used
KeSetSystemA inityThread, and it gives a KAFFINITY as its argument, and it’s a 64
Bit long variable mask.

We used KeSetSystemA inityThread when we broadcast Vmptrld, Vmclear, VMCS


Setup (Vmwrite), Vmlaunch, and Vmxo to all cores.

The best approach to run on all logical cores is letting Windows (API) execute them
on each core simultaneously. This involves raising IRQL on each core.

We have di erent options here; first, we can use KeGenericCallDpc. It’s an


undocumented function which schedules CPU-specific DPCs on all CPUs.

The definition of KeGenericCallDpc is as bellow.

1 KeGenericCallDpc(
2 _In_ PKDEFERRED_ROUTINE Routine,
3 _In_opt_ PVOID Context
4 );

The first argument is the address of the target function, which we want to execute
on each core, and context is an optional parameter to this function.

In the target function, we call KeSignalCallDpcSynchronize and


KeSignalCallDpcDone to avoid synchronization problems so that all the cores
finish at the same time.

KeSignalCallDpcSynchronize waits for all DPCs to synchronize at that point (where


we call KeSignalCallDpcSynchronize).

1 LOGICAL
2 KeSignalCallDpcSynchronize(
3 _In_ PVOID SystemArgument2
4 );

Finally, KeSignalCallDpcDone marks the DPC as being complete.

1 VOID
2 KeSignalCallDpcDone(
3 _In_ PVOID SystemArgument1
https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 42/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

4 );

The above two functions have to be executed as the last step (when everything
completes) in the target function.

Another option is using KeIpiGenericCall, this routine causes the specified function
to run on all processors simultaneously, and it’s documented. I used the first
approach in Hypervisor From Scratch, and these updates are applied to both the
initialization phase and the Vmxo phase.

Synchronization problem in exiting VMX

As we now support more than 64 logical cores using DPCs, and as most of the
functions are executed simultaneously, we have some problems with our previously
designed routines. For example, in the previous parts, I used gGuestRSP and
gGuestRIP to return to the former state. Using one global variable on all cores
causes errors as one core might save its RIP and RSP (core 1), then other core (core
2) keeps the same data in these variables, When the first core (core 1) tries to restore
the state, it’s the state of second core (core 2), and you’ll see a BSOD 😀 .

In order to solve this problem, we have to store a per-core structure which saves the
Guest RIP and Guest RSP. The following structure is used for this purpose.

1 typedef struct _VMX_VMXOFF_STATE


2 {
3 BOOLEAN IsVmxoffExecuted; // Shows whether the VMXOFF executed or not
4 UINT64  GuestRip; // Rip address of guest to return
5 UINT64  GuestRsp; // Rsp address of guest to return
6  
7 } VMX_VMXOFF_STATE, * PVMX_VMXOFF_STATE;

We add the above structure to VIRTUAL_MACHINE_STATE as it’s a per-core


structure.

1 typedef struct _VIRTUAL_MACHINE_STATE


2 {
3 ...
4 VMX_VMXOFF_STATE VmxoffState; // Shows the vmxoff state of the guest
5 ...
6 } VIRTUAL_MACHINE_STATE, * PVIRTUAL_MACHINE_STATE;
7  

We need to broadcast Vmxo to all of the logical cores. This is done by using the
HvTerminateVmx; this function is called once and broadcast
HvDpcBroadcastTerminateGuest to all logical cores and de-allocates (free) all the
EPT related tables and pre-allocated bu ers.

1 /* Terminate Vmx on all logical cores. */


2 VOID HvTerminateVmx()
3 {
4 // Broadcast to terminate Vmx

https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 43/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

5 KeGenericCallDpc(HvDpcBroadcastTerminateGuest, 0x0);
6  
7 /* De-allocatee global variables */
8  
9 // Free each split
10 FOR_EACH_LIST_ENTRY(EptState->EptPageTable, DynamicSplitList, VMM_EPT_DY
11 ExFreePoolWithTag(Split, POOLTAG);
12 FOR_EACH_LIST_ENTRY_END();
13  
14 // Free Identity Page Table
15 MmFreeContiguousMemory(EptState->EptPageTable);
16  
17 // Free GuestState
18 ExFreePoolWithTag(GuestState, POOLTAG);
19  
20 // Free EptState
21 ExFreePoolWithTag(EptState, POOLTAG);
22  
23 }

HvDpcBroadcastTerminateGuest is responsible for synchronizing DPCs and calling


a VMX function call VmxTerminate.

1 /* The broadcast function which terminate the guest. */


2 VOID HvDpcBroadcastTerminateGuest(struct _KDPC* Dpc, PVOID DeferredContex
3 {
4 // Terminate Vmx using Vmcall
5 if (!VmxTerminate())
6 {
7 LogError("There were an error terminating Vmx");
8 }
9  
10 // Wait for all DPCs to synchronize at this point
11 KeSignalCallDpcSynchronize(SystemArgument2);
12  
13 // Mark the DPC as being complete
14 KeSignalCallDpcDone(SystemArgument1);
15 }

VmxTerminate de-allocates per-core allocated regions like the Vmxon region, Vmcs
region, Vmm Stack, and Msr Bitmap. As we implemented our Vmcall mechanism, we
can use Vmcall to request a vmxo from the vmx root mode (instead of what we’ve
done in the previous version with CPUID Handler). So it executes AsmVmxVmcall
with VMCALL_VMXOFF on each core, and each core will run vmxo separately.

1 /* Broadcast to terminate VMX on all logical cores */


2 BOOLEAN VmxTerminate()
3 {
4 int CurrentCoreIndex;
5 NTSTATUS Status;
6  
7 // Get the current core index
8 CurrentCoreIndex = KeGetCurrentProcessorNumber();
9  
10 LogInfo("\tTerminating VMX on logical core %d", CurrentCoreIndex);
11  
12 // Execute Vmcall to to turn off vmx from Vmx root mode
13 Status = AsmVmxVmcall(VMCALL_VMXOFF, NULL, NULL, NULL);
14  
15 // Free the destination memory
16 MmFreeContiguousMemory(GuestState[CurrentCoreIndex].VmxonRegionVirtualAd
17 MmFreeContiguousMemory(GuestState[CurrentCoreIndex].VmcsRegionVirtualAdd
18 ExFreePoolWithTag(GuestState[CurrentCoreIndex].VmmStack, POOLTAG);

https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 44/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

19 ExFreePoolWithTag(GuestState[CurrentCoreIndex].MsrBitmapVirtualAddress,
20  
21 if (Status == STATUS_SUCCESS)
22 {
23 return TRUE;
24 }
25  
26 return FALSE;
27 }
28  

Our Vmcall handler calls VmxVmxo , and as this function is executed under vmx
root-mode, so it’s allowed to run VMXOFF. This function also saves the GuestRip
and GuestRsp into the per-core VMX_VMXOFF_STATE structure. This is where we
solved the problem as we’re not using a shared global variable anymore. It also sets
IsVmxo Executed, which indicates whether the logical core is on VMX operation or
it le the VMX operation by executing VMXOFF.

The VmxVmxo is implemented like this :

1 /* Prepare and execute Vmxoff instruction */


2 VOID VmxVmxoff()
3 {
4 int CurrentProcessorIndex;
5 UINT64 GuestRSP; // Save a pointer to guest rsp for times that we want t
6 UINT64 GuestRIP; // Save a pointer to guest rip for times that we want t
7 UINT64 GuestCr3;
8 UINT64 ExitInstructionLength;
9  
10  
11 // Initialize the variables
12 ExitInstructionLength = 0;
13 GuestRIP = 0;
14 GuestRSP = 0;
15  
16 CurrentProcessorIndex = KeGetCurrentProcessorNumber();
17  
18 /*
19 According to SimpleVisor :
20 Our callback routine may have interrupted an arbitrary user process,
21 and therefore not a thread running with a system-wide page directory.
22 Therefore if we return back to the original caller after turning off
23 VMX, it will keep our current "host" CR3 value which we set on entry
24 to the PML4 of the SYSTEM process. We want to return back with the
25 correct value of the "guest" CR3, so that the currently executing
26 process continues to run with its expected address space mappings.
27 */
28  
29 __vmx_vmread(GUEST_CR3, &GuestCr3);
30 __writecr3(GuestCr3);
31  
32 // Read guest rsp and rip
33 __vmx_vmread(GUEST_RIP, &GuestRIP);
34 __vmx_vmread(GUEST_RSP, &GuestRSP);
35  
36 // Read instruction length
37 __vmx_vmread(VM_EXIT_INSTRUCTION_LEN, &ExitInstructionLength);
38 GuestRIP += ExitInstructionLength;
39  
40 // Set the previous registe states
41 GuestState[CurrentProcessorIndex].VmxoffState.GuestRip = GuestRIP;
42 GuestState[CurrentProcessorIndex].VmxoffState.GuestRsp = GuestRSP;
43  
44 // Notify the Vmexit handler that VMX already turned off
https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 45/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

45 GuestState[CurrentProcessorIndex].VmxoffState.IsVmxoffExecuted = TRUE;
46  
47 // Execute Vmxoff
48 __vmx_off();
49  
50 }

As we return to vm-exit handler, we check whether we le the VMX opeation or not.

1 if (GuestState[CurrentProcessorIndex].VmxoffState.IsVmxoffExecuted)
2 {
3 return TRUE;
4 }

We also define two other functions called “HvReturnStackPointerForVmxo ” and


“HvReturnInstructionPointerForVmxo “, which find the logical core index and
returns the corresponding stack pointer and RIP to return.

HvReturnStackPointerForVmxo is :

1 /* Returns the stack pointer, to change in the case of Vmxoff */


2 UINT64 HvReturnStackPointerForVmxoff()
3 {
4 return GuestState[KeGetCurrentProcessorNumber()].VmxoffState.GuestRsp;
5 }

And HvReturnInstructionPointerForVmxo is:

1 /* Returns the instruction pointer, to change in the case of Vmxoff */


2 UINT64 HvReturnInstructionPointerForVmxoff()
3 {
4 return GuestState[KeGetCurrentProcessorNumber()].VmxoffState.GuestRip;
5 }

Eventually, when we detect that we le the vmx operation, instead of executing


VMRESUME we’ll run AsmVmxo Handler, this function calls the
HvReturnStackPointerForVmxo and HvReturnInstructionPointerForVmxo
and puts the value of RSP and RIP a er the general-purpose registers thus when we
restore the general-purpose registers, we can pop the RSP from the stack and return
to the previous address (ret) and continue normal execution.

1  
2 AsmVmxoffHandler PROC
3     
4     sub rsp, 020h       ; shadow space
5     call HvReturnStackPointerForVmxoff
6     add rsp, 020h       ; remove for shadow space
7  
8     mov [rsp+088h], rax  ; now, rax contains rsp
9  
10     sub rsp, 020h       ; shadow space
11     call HvReturnInstructionPointerForVmxoff
12     add rsp, 020h       ; remove for shadow space
13  
14     mov rdx, rsp        ; save current rsp
15  
16     mov rbx, [rsp+088h] ; read rsp again
17  
https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 46/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

18     mov rsp, rbx


19  
20     push rax            ; push the return address as we changed the stack
21                         ; it to the new stack
22  
23     mov rsp, rdx        ; restore previous rsp
24                         
25     sub rbx,08h         ; we push sth, so we have to add (sub) +8 from pr
26                         ; also rbx already contains the rsp
27     mov [rsp+088h], rbx ; move the new pointer to the current stack
28  
29 RestoreState:
30  
31 pop rax
32     pop rcx
33     pop rdx
34     pop rbx
35     pop rbp          ; rsp
36     pop rbp
37     pop rsi
38     pop rdi
39     pop r8
40     pop r9
41     pop r10
42     pop r11
43     pop r12
44     pop r13
45     pop r14
46     pop r15
47  
48     popfq
49  
50 pop rsp     ; restore rsp
51 ret             ; jump back to where we called Vmcall
52  
53 AsmVmxoffHandler ENDP
54  

As you can see, we no longer have the problem of using a global variable among all
the cores.

The issues relating to the Meltdown mitigation

As you know, EXIT_REASON_CR_ACCESS is one of the reasons that might cause


VM-Exit (Especially if you’re subject to 1-setting of CRs in your VMCS). Hypervisors
used to save all the general-purpose registers every time a VM-Exit occurs and then
restore it at the next VMRESUME.

In the previous versions of our driver, we ignored RSP and save some trash instead
of it, that’s because RSP of guest is already saved in GUEST_RSP in VMCS. A er
VMRESUME, it’s loaded automatically, and you know, our current RSP is invalid (it’s
host RSP).

A er meltdown mitigation, Windows uses MOV CR3, RSP, and as we saved trash
instead of RSP, then you change CR3 to an invalid value, and it silently crashes with
TRIPLE FAULT VM-Exit. It won’t give you the exact error.

https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 47/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

For fixing this issue, we add the following code to


HvHandleControlRegisterAccess, so each time when a vm-exit occurs, we change
the RSP to the correct value.

1 /* Because its RSP and as we didn't save RSP correctly (because of pushes
2 if (CrExitQualification->Fields.Register == 4)
3 {
4 __vmx_vmread(GUEST_RSP, &GuestRsp);
5 *RegPtr = GuestRsp;
6 }

Previously, this was mentioned by Alex, for more information, you can read this
article.

Some tips for debugging hypervisors

Always try to test your hypervisor in a uni-core system. If it works then, you can
check it on a multi-core, so when something doesn’t work on multi-core and works
on uni-core, then know that it’s a synchronization problem.

Don’t try to call Nt functions in Vmx root mode. Most of NT functions are not
suitable to run in a high IRQL, so if you use it, it leads to weird behavior and crashes
the whole or system is halted.

For more information, I really recommend reading Hyperplatform’s User Document


(4.4. Coding Tips).

Let’s Test it!

Let’s see how we can test our hypervisor,

How to test?

In order to test our new hypervisor, we have two scenarios, and the following codes
show how we test our hypervisor, the codes for tests are available at (Ept.c and
HypervisorRoutines.c).

In the first scenario, we want to test page hook (EptPageHook) before executing
vmlaunch, which means that Ept is initialized, and then we want to put the hook
before entering VMX. (the test code is on Ept.c)

1 ///////////////////////// Example Test /////////////////////////


https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 48/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

2 EptPageHook(ExAllocatePoolWithTag, FALSE);
3 ///////////////////////////////////////////////////////////////

The above function puts hook on the execution of a page containing a function (in
this case ExAllocatePoolWithTag).

The second scenario is we want to test both VMCALL and EptPageHook a er our
hypervisor is loaded, and we’re in Vmx non-root mode (the test code is on
HypervisorRoutines.c).

1 //  Check if everything is ok then return true otherwise false


2 if (AsmVmxVmcall(VMCALL_TEST, 0x22, 0x333, 0x4444) == STATUS_SUCCESS)
3 {
4 ///////////////// Test Hook after Vmx is launched /////////////////
5 EptPageHook(ExAllocatePoolWithTag, TRUE);
6 ///////////////////////////////////////////////////////////////////
7 return TRUE;
8 }
9 else
10 {
11 return FALSE;
12 }

As you can see, it first tests the Vmcall using VMCALL_TEST and then puts the hook
to a function (in this case ExAllocatePoolWithTag).

Demo

First, we load our hypervisor driver,

For the first scenario, you can see that we successfully notified about the execution
of ExAllocatePoolWith tag a er vmlaunch executed, and Guest Rip is equal to the

https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 49/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

address of ExAllocatePoolWithTag and EptHandleEptViolation is responsible for


handling Ept violations.

In the second testing scenario, you can see that our VMCALL is successfully executed
(green line), and we notified about the execution of a page, but wait, we put our
Execute Access hook on ExAllocatePoolWithTag, but the Guest Rip is equal to
ExFreePool, Why?

It turns out that ExAllocatePoolWithTag and ExFreePool are both on the same
page, and ExFreePool is executed earlier than ExAllocatePoolWithTag, so we get
the execution of this function.

The above test results show the importance of checking Guest Rip in the EPT
violation handler. We’ll talk about it in the next part.
https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 50/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

Finally, you can see the following picture which shows whether our hook
successfully applied or not.

Discussion

This part is added to answer to questions about EPT, we’ll discuss di erent
approaches and pros and cons of them, so this part will be actively updated. Thanks
to Petr for answering these questions.

1. Why there are limitations on calling NT functions in VMX Root Mode?

It is because of paging and high IRQL. The reason is what explained here for the high
IRQL and as we’re in a high IRQL in Vmx root mode then some pages(paged pools)
might be paged-out.

The hypervisor can use a completely di erent address space than the NT kernel, I
believe this is what regular hypervisors like Hyper-V/XEN do. They don’t use
“identity EPT mapping“, therefore VA 0x10000 in VMX-root mode does not point to
the same physical memory as 0x10000 in VMX non-root mode.

For example, let’s pick an NT function that can be called at HIGH_IRQL


(MmGetPhysicalAddress). Let’s imagine this function is on virtual address 0x1234,
but this virtual address points to that function in VMX non-root, in ntoskrnl address
space.

The real question should be: “Why can I call some NT functions in VMX-root mode”
the answer is that you set HOST_CR3 in VMCS to be the same as CR3 of the NT main
System process, therefore hypervisor in vmx root-mode share the same memory
view as VMX non-root mode.

https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 51/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

It is important to know this, in practice for self-virtualizing hypervisors (like


hyperplatform/hvpp), you don’t care, because as I said your HOST_CR3 is the same
as NT’s CR3, therefore you can touch whatever memory you want.

If you happened to work on HyperV or XEN, you don’t have the same luxury. the
hypervisor memory address space is not mapped at all in the virtualized OS (that’s
quite the point of virtualization).

2. Why we shouldn’t modify EPT in VMX Non-Root?

In the ideal world, no memory of the hypervisor should be visible from the
virtualized OS (you cant see XEN internals from the virtualized OS for example).

in hyperplatform/hvpp, you can see the memory of the hypervisor. Why? This time
it’s not because of HOST_CR3 but because of identity EPT mapping – you set EPT
tables in such a way, that the virtualized OS can see even the memory of the
hypervisor itself.

My point is – in the ideal world you shouldn’t even see the EPT structures from
within the VMX non-root mode, imagine it this way, can you modify regular page-
tables from user-mode?

The answer is it depends. In reality? No. Why? because the page-tables are in kernel
memory that is inaccessible from the user-mode. That’s the whole point of memory
protection. Could you set page tables in such a way that it would be possible to
modify them from user-mode? Yes, but it doesn’t mean you should though. This is
sort of a security thing.

There’s one even more important reason: caches

Now you might have tried it and it worked most of the time in your case but that
doesn’t mean it’s the correct approach.

3. What are the advantages of having EPT table for each processor separately?

When you change EPT structures and you want that change to be synced across
CPUs, you have to perform IPI (KeIpiGenericCall) from within VMX root mode to
flush caches on all CPUs.

In an ideal world, you would call KeIpiGenericCall from VMX-root mode. but you
can’t – you’ll fastly end up in a deadlock. You’d need to implement your own IPI
mechanism and set correctly APIC for VMX-root mode.

https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 52/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

Now this can be done – but it would be non-trivial to implement.

When you have multiple EPTs for each CPU, you don’t have to do IPIs, each core
manages its own EPT.

Now they won’t be 100% synced all the time, but if the EPT handler logic is the same
for each core and doesn’t change over time, it doesn’t matter.

Conclusion

We come to the end of this part. I believe EPT is the most important feature that can
be used by researchers, security programs and game hackers as it gives a unique
ability to monitor the operating system and user-mode applications. In the next
part, we’ll be using EPT and implement hidden hook mechanisms, which commonly
used among hypervisors. Also, we’ll improve our hypervisor by using WPP Tracing
instead of using DbgPrint, event injection, and a mechanism to talk from Vmx root-
mode to Vmx non-root mode and finally we’ll see how to use Virtual Processor
Identifier (VPID). Feel free to use the comments below to ask questions or ask for
clarification.

See you guys in the next part.

The 8th part is available here.

References

[1] Memory type range register –


(https://en.wikipedia.org/wiki/Memory_type_range_register)

https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 53/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

[2] KVA Shadow: Mitigating Meltdown on Windows – (https://msrc-


blog.microso .com/2018/03/23/kva-shadow-mitigating-meltdown-on-windows/)
[3] How to Implement a so ware-based SMEP(Supervisor Mode Execution
Protection) with Virtualization/Hypervisor Technology –
(http://hypervsir.blogspot.com/2014/11/how-to-implement-so ware-based.html)
[4] Vol 3A – Chapter 11 – (11.11.3 Example Base and Mask Calculations) –
(https://so ware.intel.com/en-us/articles/intel-sdm)
[5] x86 Paging Tutorial – (https://cirosantilli.com/x86-paging)
[6] OSDev notes 2: Memory management –
(http://ethv.net/workshops/osdev/notes/notes-2)
[7] Vol 3A – Chapter 11 – (11.11 MEMORY TYPE RANGE REGISTERS (MTRRS)) –
(https://so ware.intel.com/en-us/articles/intel-sdm)
[8] Vol 3A – Chapter 11 – (11.12 PAGE ATTRIBUTE TABLE (PAT)) –
(https://so ware.intel.com/en-us/articles/intel-sdm)
[9] HyperPlatform User Document –
(https://tandasat.github.io/HyperPlatform/userdocument/)
[10] Vol 3C – Chapter 34– (34.15.2 SMM VM Exits) – (https://so ware.intel.com/en-
us/articles/intel-sdm)
[11] Vol 3C – Chapter 34– (34.15.6 Activating the Dual-Monitor Treatment) –
(https://so ware.intel.com/en-us/articles/intel-sdm)
[12] Windows Hotpatching: A Walkthrough –
(https://jpassing.com/2011/05/03/windows-hotpatching-a-walkthrough/)
[13] Vol 3C – Chapter 28– (28.2.3.1 EPT Misconfigurations) –
(https://so ware.intel.com/en-us/articles/intel-sdm)
[14] Vol 3C – Chapter 28– (28.2.3.2 EPT Violations) – (https://so ware.intel.com/en-
us/articles/intel-sdm)
[15] R.I.P ROP: CET Internals in Windows 20H1 – (http://windows-internals.com/cet-
on-windows)
[16] Inside Windows Page Frame Number (PFN) Part 1 –
(https://rayanfam.com/topics/inside-windows-page-frame-number-part1)
[17] Inside Windows Page Frame Number (PFN) Part 2 –
(https://rayanfam.com/topics/inside-windows-page-frame-number-part2)
[18] why we can access memory from non paged pool at or above DISPATCH LEVEL –
(https://stackoverflow.com/questions/18764211/why-we-can-access-memory-
from-non-paged-pool-at-or-above-dispatch-level)

PAGES

Blog Map

Contact

https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 54/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

Tools & Scripts

Tutorials

Sina Karvandi

Sven, Are you still so blue under that armor?

  

Published in CPU, Hypervisor and Tutorials

2MB EPT 4KB EPT EPT Hook EPT Monitoring EPT Table

Extended Page Table Extended Page Table Pointer Hidden Hook

Hypervisor Tutorial part 7 Intel Vt-x Paging Invalidate EPT INVEPT

Memory Type Range Register MMU Virtualization MTRR

Second Level Address Translation SLAT Vmcall

LiWan  Reply

Your tutorial is awesome, and I look forward to Part 8 being released soon, if you
can, I would be happy to donate to you, thank you very much!

Sina Karvandi  Reply

Thank you, Part 8 will release soon and also I’m not willing to accept
donations.

https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 55/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

Lord Noteworthy  Reply

Hello Sina, I appreciate the huge e orts you made to come up with this article.

Few gotchas:

– In this section we discuss the di erent question(s) and approaches about EPT) …
– (,) when the CPU cache was moved inside the CPU, the CPUs implemented fixed-
range MTRRs.
– To fix these kinda (of) problems …
– In the target function, we call KeSignalCallDpcSynchronize and
(KeSignalCallDpcSynchronize) -> should be KeSignalCallDpcDone .

Sina Karvandi  Reply

Thanks for reading, fixed.

Leave a Reply

Your email address will not be published. Required fields are marked *

Comment

Name*

Jane Doe

Email*

name@email.com

https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 56/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

ARCHIVES

March 2020

January 2020

December 2019

June 2019

February 2019

January 2019

December 2018

October 2018

September 2018

August 2018

July 2018

June 2018

May 2018

April 2018

March 2018

January 2018

December 2017

November 2017

October 2017

September 2017

August 2017

April 2017

March 2017

CATEGORIES

.Net Framework

Android

Cisco
https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 58/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

Website

http://google.com

Post Comment

Follow @Intel80x86 Follow @Shahriare8

Search …

Search

RECENT POSTS

Hypervisor From Scratch – Part 8: How To Do Magic With Hypervisor!

Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

Reversing Windows Internals (Part 1) – Digging Into Handles, Callbacks & ObjectTypes

Why you should not always trust MSDN: Finding Real Access Rights Needed By Handles

Hypervisor From Scratch – Part 6: Virtualizing An Already Running System

RECENT COMMENTS

Sina Karvandi on Bochs Emulator – Config & Build on Windows and OS X

Princekin on Bochs Emulator – Config & Build on Windows and OS X

Sina Karvandi on Hypervisor From Scratch – Part 8: How To Do Magic With Hypervisor!

mark Lim on Hypervisor From Scratch – Part 8: How To Do Magic With Hypervisor!

Sina Karvandi on Hypervisor From Scratch – Part 8: How To Do Magic With Hypervisor!

https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 57/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

CPU

Debugging

Emulator

Hypervisor

Instrumentation

Kernel Mode

Linux

Malware

Network

Pentest

Programming

Ransomware

Security

Social

So ware

SysAdmin

Tutorials

User Mode

Windows

TAGS

active directory Assembly x64 Visual Studio begining cache cisco Create a virtual machine

debian Extended Page Table Extended Page Table Pointer getting started with
pykd helloworld Hidden Hook Hypervisor fundamentals Hypervisor Tutorials

Invalidate EPT INVEPT ios ipsec iptables kernel linux network opensource
Page management in Windows PFN PFN Database proxy PyKD example PyKD sample
PyKD scripts PyKD tutorial run PyKD command Second Level Address Translation
SLAT start systemd tunnel Virtual machine control structure VMCS VMFUNC
VMLAUNCH 0x7 VMLAUNCH 0x8 VMLAUNCH error windows server x64 assembly in driver

https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 59/60
2020/7/31 Hypervisor From Scratch – Part 7: Using EPT & Page-Level Monitoring Features

Sina & Shahriar's Blog


An aggressive out-of-order blog…

The contents of this blog is licensed to the public under a Creative Commons Attribution 4.0 license.

https://rayanfam.com/topics/hypervisor-from-scratch-part-7/ 60/60

Das könnte Ihnen auch gefallen