Wednesday, November 17, 2021

[SOLVED] x86 cr3 and linux swqpper_pg_dir

Issue

In Linux source code (version 2.6.18):

movl $swapper_pg_dir-__PAGE_OFFSET,%eax
movl %eax,%cr3
movl %cr0,%eax
orl $0x80000000,%eax
movl %eax,%cr0          /* ..and set paging (PG) bit */
ljmp $__BOOT_CS,$1f     /* Clear prefetch and normalize %eip */

And also the load_cr3(pgdir) and write_cr3(x) macros:

#define load_cr3(pgdir) write_cr3(__pa(pgdir))

#define write_cr3(x) \
__asm__ __volatile__("movl %0,%%cr3": :"r" (x))

It seems like that the whole cr3 control register stores the address of Page Directory. However, when I reference the intel ia-32 Developer's_Manual it tells a different story. The following is what the intel manual says:

name      0.............11   12.................31
cr3       flags              address of page directory
PDE       flags              address of page table
PTE       flags              address of 4kb page frame

The manual says that the 20 most significant bits of cr3 stores the address of the page directory instead of the whole cr3 register. It is also reasonable since the page directory is exactly 4kb, so the 12 least significant bits of the address is always zero.

Isn't it a little bit strange? The linux code just assigns the address of the page directory to the cr3 instead of the 20 most significant bits of the swapper_pg_dir. What exactly does the cr3 register stores, the address or the format that intel manual suggests?

The following link is the intel manual: http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html


Solution

For 32-bit paging, it is mandatory that the address of the page directory is a multiple of 4096, i.e. its 12 LSB are zero. However, the opcode for setting cr3 loads 32 bits, not 20 bits. When cr3 is loaded, its 20 upper bits are used for the page directory address, and the lower 12 bits are interpreted as flags which may affect paging behaviour in newer processor versions. The "safe" setting for these flags is zero, and that's precisely what Linux does: it loads cr3 with a 32-bit value which happens to have its 12 LSB equal to zero (because that 32-bit value has been taken as a memory address which is a multiple of 4096).



Answered By - Thomas Pornin