Wednesday, May 25, 2022

[SOLVED] Is the Linux paging model an abstraction?

Issue

I'm currently reading the Understanding the Linux kernel Third Edition and I'm on chapter 2 about memory addressing. At first the book covers paging in 32 bit, PAE 32 bit, and PSE (we are talking about x86 here). More specifically the anatomy of linear address and what bits are for what tables, offsets, etc... I started to get confused about the paging model in Linux. At one moment the book was talking about directory, table, and offset bits of a linear address (PDPT table for PAE) and next I was thrown into the world of "Linux" paging. Now with the Linux paging they talk about Global, Upper, and Middle tables with table and offset? I don't see how the x86 MMU paging relates to this new Linux model at all. If the MMU is in charge of translating (paging) addresses, why does the kernel need this paging model as well? It just seems like the kernel should just leave it up to the MMU. If anyone could elaborate on why the kernel has this, that would be great!

I understand that the MMU has to translate address based on the tables that the kernel manages. So, the MMU is in charge of address translation (from memory accesses from a process running on a core) but the kernel is not! So why do we have this Global, Upper, and Middle tables with table and offset stuff?

Or maybe this Linux paging model is more abstract than I'm making it to be! Maybe the idea of this kernel page table is not really a table but a set of kernel macros specifying the properties of the many levels of page directories/tables that the kernel has to maintain! For example, the PGD_SIZE, PUD_SIZE and the other SIZE macros (along with the SHIFT and MASK macros) specify the different properties of the levels of paging. Based on these macros (of course there are other macros) the kernel can generate the correct page tables in memory? This Linux paging model can adjust the SHIFT macros based on the specific architecture (more specifically the bit layout of linear addresses on the specific architecture)?


Solution

At one moment the book was talking about directory, table, and offset bits of a linear address (PDPT table for PAE) and next I was thrown into the world of "Linux" paging. Now with the Linux paging they talk about Global, Upper, and Middle tables with table and offset?

Yeah... this can get pretty confusing pretty quickly. When the Intel manual talks about paging, page table entries at different levels are called:

  • PML5E = Page Map Level 5 Entry
  • PML4E = Page Map Level 4 Entry
  • PDPTE = Page Directory Pointer Table Entry
  • PDE = Page Directory Entry
  • PTE = Page Table Entry

As you can deduce from the above names, x86 supports up to 5 levels of page tables (on modern processors). There are different paging models that can be used on any given processor based on its capabilities, for example 32bit without PAE paging uses 2-level page tables: we only have PDEs and PTEs, and CR3 points to a page directory). More page table levels can be set up and used, and in that case we begin talking about PDPTEs (for 3-level paging), PML4Es (4-level paging) and PML5Es (5-level paging).

Now of course page tables are pretty much an ubiquitous concept amongst most CPU architectures, not only x86. However each architecture has its specific way of naming page table entries at different levels. Linux supports a lot of different architectures, so it gives page table entries some "generic" names:

  • pgd_t = Page Global Directory entry
  • p4d_t = Page level-4 Directory entry
  • pud_t = Page Upper Directory entry
  • pmd_t = Page Middle Directory entry
  • pte_t = Page Table Entry

Now this is where it could get confusing. There is a major distinction to be made between Intel's and Linux's naming convention. Regardless of the number of levels of page tables, for Linux the Page Global Directory always represents the root (so pgd_t always represents highest-level entries):

  • For 5-level paging, the correspondence between Intel names and Linux names is just plain and simple: pgd_t:PML5E, p4d_t:PML4E, pud_t:PDPTE, pmd_t:PDE, pte_t:PTE.
  • For 4-level paging, we have pgd_t:PML4E, and p4d_t is unnecessary, so it essentially becomes an alias for pgd_t (before 5-level paging support, there simply was no p4d_t at all).
  • Analogously, for 3-level paging we have pgd_t:PDPTE, and for 2-level pgd_t:PDE.

If the MMU is in charge of translating (paging) addresses, why does the kernel need this paging model as well? It just seems like the kernel should just leave it up to the MMU.

Well, yeah, the kernel does leave the work to the MMU. What we are talking about here are just naming conventions. Linux does not use another paging model on top of the one already used by the MMU, it just has a different and more generic way of naming things, that's it.


Maybe the idea of this kernel page table is not really a table but a set of kernel macros specifying the properties of the many levels of page directories/tables that the kernel has to maintain!

[...]

Based on these macros (of course there are other macros) the kernel can generate the correct page tables in memory? This Linux paging model can adjust the SHIFT macros based on the specific architecture (more specifically the bit layout of linear addresses on the specific architecture)?

Yes, yes and yes. You got it. That's exactly the rationale behind this naming abstraction. The kernel only calls things one way, but adjusts macros (e.g. PMD_OFFSET), types (e.g. pmd_t), and logic based on the underlying architecture.



Answered By - Marco Bonelli
Answer Checked By - Marie Seifert (WPSolving Admin)