Issue
I'm playing around with VMX on XUbuntu 16.04, but I'm running into some issues with setting the VMXE bit of CR4. The issue is that by the time my exit function is called, the bit is no longer set.
vmmod.c
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/types.h>
#define AUTHOR "me"
#define DESC "Test"
extern u64 read_cr4(void);
extern void write_cr4(u64 val);
static bool IsVMXEEnabled(void)
{
return (read_cr4() >> 13) & 1;
}
static void SetVMXEEnabled(void* _val)
{
bool val = *(bool*)_val;
u64 mask = (1 << 13);
u64 cr4 = read_cr4();
if (val)
cr4 |= mask;
else
cr4 &= (~mask);
write_cr4(cr4);
}
static void LogVMXEState(void* info)
{
(void) info;
printk(KERN_INFO "CR4: %08LX\n", read_cr4());
}
static int __init init_(void)
{
printk(KERN_INFO "===================================\n");
if (IsVMXEEnabled())
printk(KERN_INFO "VMXE Is Enabled\n");
else
{
bool new_vmxe_state = true;
printk(KERN_INFO "Enabling VMXE\n");
on_each_cpu(SetVMXEEnabled, &new_vmxe_state, 1);
if (IsVMXEEnabled())
{
printk(KERN_INFO "VMXE Has Been Enabled\n");
on_each_cpu(LogVMXEState, NULL, 1);
}
else
{
printk(KERN_INFO "VMXE Could Not Be Enabled\n");
return -1;
}
}
return 0;
}
static void __exit exit_(void)
{
printk(KERN_INFO "----------------------------------------\n");
on_each_cpu(LogVMXEState, NULL, 1);
if (IsVMXEEnabled())
{
bool new_val = false;
printk(KERN_INFO "Disabling VMXE\n");
on_each_cpu(SetVMXEEnabled, &new_val, 1);
if (!IsVMXEEnabled())
printk(KERN_INFO "VMXE Has Been Disabled\n");
else
printk(KERN_INFO "Couldn't disabled VMXE...\n");
}
else
printk(KERN_INFO "VMXE Wasn't enabled?\n");
printk(KERN_INFO "===================================\n");
}
MODULE_LICENSE("GPL");
MODULE_AUTHOR(AUTHOR);
MODULE_DESCRIPTION(DESC);
module_init(init_);
module_exit(exit_);
vmasm.S
.intel_syntax noprefix
.text
.global read_cr4
read_cr4:
mov rax, cr4
ret
.global write_cr4
write_cr4:
mov cr4, rdi
ret
Makefile
obj-m += testmod.o
testmod-objs := vmmod.o vmasm.o
all:
make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules
clean:
make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean
Testing
$> sudo insmod testmod.ko && sudo rmmod testmod
Output
[ 607.459248] ===================================
[ 607.459256] Enabling VMXE
[ 607.459302] VMXE Has Been Enabled
[ 607.459311] CR4: 000426E0
[ 607.459315] CR4: 000426E0
[ 607.459318] CR4: 000426E0
[ 607.459321] CR4: 000426F0
[ 607.459334] CR4: 000426E0
[ 607.459336] CR4: 000426E0
[ 607.459338] CR4: 000426E0
[ 607.459373] CR4: 000426E0
[ 607.473007] ----------------------------------------
[ 607.473025] CR4: 000406E0
[ 607.473065] CR4: 000406E0
[ 607.473068] CR4: 000406F0
[ 607.473072] CR4: 000406E0
[ 607.473074] CR4: 000406E0
[ 607.473078] CR4: 000406E0
[ 607.473080] CR4: 000406E0
[ 607.473103] CR4: 000406E0
[ 607.473121] VMXE Wasn't enabled?
[ 607.473129] ===================================
The output clearly shows that Bit 13 (VMXE) of CR4 is enabled after the module load function, but during the module unload function, it's no longer set.
Is there a kernel module that would periodically reset VMXE? I have kvm.ko and kvm_intel.ko unloaded when running this code, and the Intel emulation BIOS settings have been enabled, and the CPU supports VMX.
As per (Modifying control register in kernel module), I tried adding on_each_cpu
to set VMXE on each CPU core, but it didn't help.
Any Ideas?
Thanks!
Solution
The Linux kernel is not deliberately clearly CR4.VMXE. Rather, Linux caches the value of CR4 and uses the cache instead of reading the register, perhaps for performance reasons. Since you didn't change that cache, the next time the kernel tries to clear a bit in CR4, it will restore the VMXE bit from the cache, clearing it to zero. If your driver had established a VMXON region, you would instead have seen a kernel panic when the kernel inadvertently cleared CR4.VMXE with an active VMXON region.
There isn't anything that I'm aware of that periodically resets CR4 bits. However, TLB shootdowns are somewhat common, and if any of the pages being invalidated are global, the only way to do that is to clear CR4.PGE. I don't know why global pages would be frequently invalidated, but I know a coworker of mine had to debug an issue that started around the 4.4.0 series kernels caused by CR4.PGE being cleared, so it definitely happens with some frequency.
The proper way to enable CR4 feature bits is the same way the kernel itself does it e.g. in /arch/x86/kernel/cpu/common.c:
static __always_inline void setup_smep(struct cpuinfo_x86 *c)
{
if (cpu_has(c, X86_FEATURE_SMEP))
cr4_set_bits(X86_CR4_SMEP);
}
This ends up calling this function:
void cr4_update_irqsoff(unsigned long set, unsigned long clear)
{
unsigned long newval, cr4 = this_cpu_read(cpu_tlbstate.cr4);
lockdep_assert_irqs_disabled();
newval = (cr4 & ~clear) | set;
if (newval != cr4) {
this_cpu_write(cpu_tlbstate.cr4, newval);
__write_cr4(newval);
}
}
Notice that it doesn't call __read_cr4()
but rather this_cpu_read(cpu_tlbstate.cr4)
. This is the cache that must be updated if you want the kernel to stop disabling CR4.VMXE.
Answered By - icecreamsword Answer Checked By - Candace Johnson (WPSolving Volunteer)