Saturday, February 26, 2022

[SOLVED] How does the Linux kernel temporarily disable x86 SMAP in copy_from_user?

Issue

I want to know how the Linux kernel disables x86 SMAP when executing the copy_from_user() function. I tried to find something in source code, but I failed.

href="https://en.wikipedia.org/wiki/Supervisor_Mode_Access_Prevention" rel="nofollow noreferrer">Supervisor Mode Access Prevention (SMAP) is a security feature of x86 CPUs to prevent the kernel from accessing unintended user-space memory, which helps to fend off various exploits.


Solution

As documented in the Wikipedia page that you linked:

SMAP is enabled when memory paging is active and the SMAP bit in the CR4 control register is set. SMAP can be temporarily disabled for explicit memory accesses by setting the EFLAGS.AC (Alignment Check) flag. The stac (Set AC Flag) and clac (Clear AC Flag) instructions can be used to easily set or clear the flag.

The Linux kernel does exactly this to temporarily disable SMAP: it uses stac to set EFLAGS.AC before copying the data, and then uses clac to clear EFLAGS.AC when done.

The AC flag has existed since 486 as alignment check for user-space load/store; SMAP overloads the meaning of that flag bit. stac/clac are new with SMAP and are only allowed in kernel mode (CPL=0); they fault in user-space (and on CPUs without SMAP, also in kernel mode).


In theory it's pretty simple, but in practice the Linux kernel codebase is a jungle of functions, macros, inline assembly templates, etc. To find out exactly how this is done we can look at the source code, starting from copy_from_user():

  1. When copy_from_user() is called, it makes a quick check to see if the memory range is valid, then calls _copy_from_user()...

  2. ... which does another couple of checks and then calls raw_copy_from_user()...

  3. ... which, before doing the actual copy, calls __uaccess_begin_nospec()...

  4. ... which is just a macro that expands to stac(); barrier_nospec().

  5. Focusing on stac(), which is a simple inline function, we have:

     alternative("", __ASM_STAC, X86_FEATURE_SMAP);
    

The alternative() macro is a pretty complicated macro for selecting alternatives for an instruction at kernel boot time, based on CPU support. You can check the source file in which it is defined for a bit more information. In this case it is used to decide whether the kernel needs to use the stac instruction or not, based on CPU support (old x86 CPUs do not have SMAP available, and therefore don't have the instruction: on those CPUs this just becomes a no-op).

Looking at the __ASM_STAC macro we see:

#define __ASM_STAC  ".byte 0x0f,0x01,0xcb"

Which is the assembled stac opcode in bytes. This is defined with the .byte directive instead of the mnemonic because, again, this needs to compile even on old toolchains where the version of binutils doesn't know about those instructions.

Once at boot, the cpuid instruction is used to check for X86_FEATURE_SMAP (bit 20 of ebx when cpuid is executed with eax=7, ecx=0 to get the extended features), and this tells the kernel whether SMAP is available (rewrite the machine code to make the instruction become stac) or not (keeping a no-op).

Once done with all of this madness (which really all just boils down to a single instruction), the actual copy from user memory is performed, and the __uaccess_end() macro is then used to re-enable SMAP. This macro uses alternative() in the same way as the one we just saw, and ends up executing clac (or a nop).



Answered By - Marco Bonelli
Answer Checked By - Dawn Plyler (WPSolving Volunteer)