Issue

I'm trying to understand how Linux uses PCIDs (aka ASIDs) on Intel architecture. While I was investigating the Linux kernel's source code and patches I found such a define with the comment:

/*
 * 6 because 6 should be plenty and struct tlb_state will fit in two cache
 * lines.
 */
#define TLB_NR_DYN_ASIDS    6

Here is, I suppose, said that Linux uses only 6 PCID values, but what about this comment:

/*
 * The x86 feature is called PCID (Process Context IDentifier). It is similar
 * to what is traditionally called ASID on the RISC processors.
 *
 * We don't use the traditional ASID implementation, where each process/mm gets
 * its own ASID and flush/restart when we run out of ASID space.
 *
 * Instead we have a small per-cpu array of ASIDs and cache the last few mm's
 * that came by on this CPU, allowing cheaper switch_mm between processes on
 * this CPU.
 *
 * We end up with different spaces for different things. To avoid confusion we
 * use different names for each of them:
 *
 * ASID  - [0, TLB_NR_DYN_ASIDS-1]
 *         the canonical identifier for an mm
 *
 * kPCID - [1, TLB_NR_DYN_ASIDS]
 *         the value we write into the PCID part of CR3; corresponds to the
 *         ASID+1, because PCID 0 is special.
 *
 * uPCID - [2048 + 1, 2048 + TLB_NR_DYN_ASIDS]
 *         for KPTI each mm has two address spaces and thus needs two
 *         PCID values, but we can still do with a single ASID denomination
 *         for each mm. Corresponds to kPCID + 2048.
 *
 */

As it is said in the previous comment, I suppose that Linux uses only 6 values for PCIDs, so in brackets we see just single values (not arrays). So ASID here can be only 0 and 5, kPCID can be only 1 and 6 and uPCID can only be 2049 and 2048 + 6 = 2054, right?

At this moment I have a few questions:

Why are there only 6 values for PCIDs? (Why is it plenty?)
Why will tlb_state structure fit in two cache lines if we choose 6 PCIDs?
Why does Linux use exactly these values for ASID, kPCID, and uPCID (I'm referring to the second comment)?

Solution

As it is said in the previous comment I suppose that Linux uses only 6 values for PCIDs so in brackets we see just single values (not arrays)

No, this is wrong, those are ranges. [0, TLB_NR_DYN_ASIDS-1] means from 0 to TLB_NR_DYN_ASIDS-1 inclusive. Keep reading for more details.

There are a few things to consider:

The difference between ASID (Address Space IDentifier) and PCID (Process-Context IDentifier) is just nomenclature: Linux calls this feature ASID across all architectures. Intel calls its implementation PCID. Linux ASIDs start at 0, Intel's PCIDs start at 1 because 0 is special and means "no PCID".
On x86 processors that support the feature, PCIDs are 12-bit values, so technically 4095 different PCIDs are possible (1 through 4095, as 0 is special).
Due to Kernel Page-Table Isolation Linux will nonetheless need two different PCIDs per task. The distinction between kPCID and uPCID is made for this reason, as each task effectively has two different virtual address spaces whose address translations need to be cached separately thus using different PCID. So we are down to 2047 usable pairs of PCIDs (plus the last single one that would just be unused).
Any normal system can easily exceed 2047 tasks on a single CPU, so no matter how many bits you use, you will never be able to have enough PCIDs for all existing tasks. On systems with a lot of CPUs you will also not have enough PCIDs for all active tasks.
Due to 4, you cannot implement PCID support as a simple assignment of a unique value for each existing/active task (e.g. like it is done for PIDs). Multiple tasks will need to "share" the same PCID sooner or later (not at the same time, but at different points in time). The logic to manage PCIDs will therefore need to be different.

The choice made by Linux developers was to use PCIDs as a way to optimize accesses to the most recently used mms (struct mm). This was implemented using a global per-CPU array (cpu_tlbstate.ctxs) that is linearly scanned on each mm-switch. Even small values of TLB_NR_DYN_ASIDS can easily trash performance instead of improving it. Apparently, 6 was a good number to choose as it provided a decent performance improvement. This means that only the 6 most-recently-used mms will use non-zero PCIDs (OK, technically the 6 most-recently-used user/kernel mm pairs).

You can see this reasoning explained more concisely in the commit message of the patch that implemented PCID support.

Why will tlb_state structure fit in two cache lines if we choose 6 PCIDs?

Well that's just simple math:

struct tlb_state {
        struct mm_struct *         loaded_mm;            /*     0     8 */
        union {
                struct mm_struct * last_user_mm;         /*     8     8 */
                long unsigned int  last_user_mm_spec;    /*     8     8 */
        };                                               /*     8     8 */
        u16                        loaded_mm_asid;       /*    16     2 */
        u16                        next_asid;            /*    18     2 */
        bool                       invalidate_other;     /*    20     1 */

        /* XXX 1 byte hole, try to pack */

        short unsigned int         user_pcid_flush_mask; /*    22     2 */
        long unsigned int          cr4;                  /*    24     8 */
        struct tlb_context         ctxs[6];              /*    32    96 */

        /* size: 128, cachelines: 2, members: 8 */
        /* sum members: 127, holes: 1, sum holes: 1 */
};

^{(information extracted through pahole from a kernel image with debug symbols)}

The array of struct tlb_context is used to keep track of ASIDs and it holds TLB_NR_DYN_ASIDS (6) entries.

Answered By - Marco Bonelli

Answer Checked By - Mildred Charles (WPSolving Admin)

Friday, October 28, 2022

[SOLVED] How does Linux use values for PCIDs?

Issue

Solution