Issue
To use memory mapped I/O, we need to first call request_mem_region.
struct resource *request_mem_region(
unsigned long start,
unsigned long len,
char *name);
Then, as kernel is running in virtual address space, we need to map physical addresses to virtual address space by running ioremap function.
void *ioremap(unsigned long phys_addr, unsigned long size);
Then why can't we access the return value directly.
From Linux Device Drivers Book
Once equipped with ioremap (and iounmap), a device driver can access any I/O memory address, whether or not it is directly mapped to virtual address space. Remember, though, that the addresses returned from ioremap should not be dereferenced directly; instead, accessor functions provided by the kernel should be used.
Can anyone explain the reason behind this or the advantage with accessor functions like ioread32
or iowrite8()
?
Solution
You need ioread8
/ iowrite8
or whatever to at least cast to volatile*
to make sure optimization still results in exactly 1 access (not 0 or more than 1). In fact they do more than that, handling endianness (They also handle endianness, accessing device memory as little-endian. Or ioread32be
for big-endian) and some compile-time reordering memory-barrier semantics that Linux chooses to include in these functions. And even a runtime barrier after reads, because of DMA. Use the _rep
version to copy a chunk from device memory with only one barrier.
In C, data races are UB (Undefined Behaviour). This means the compiler is allowed to assume that memory accessed through a non-volatile
pointer doesn't change between accesses. And that if (x) y = *ptr;
can be transformed into tmp = *ptr;
if (x) y = tmp;
i.e. compile-time speculative loads, if *ptr
is known to not fault. (Related: Who's afraid of a big bad optimizing compiler? re: why the Linux kernel need volatile
for rolling its own atomics.)
MMIO registers may have side effects even for reading so you must stop the compiler from doing loads that aren't in the source, and must force it to do all the loads that are in the source exactly once.
Same deal for stores. (Compilers aren't allowed to invent writes even to non-volatile objects, but they can remove dead stores. e.g. *ioreg = 1; *ioreg = 2;
would typically compile the same as *ioreg = 2;
The first store gets removed as "dead" because it's not considered to have a visible side effect.
C volatile
semantics are ideal for MMIO, but Linux wraps more stuff around them than just volatile.
From a quick look after googling ioread8
and poking around in https://elixir.bootlin.com/linux/latest/source/lib/iomap.c#L11 we see that Linux I/O addresses can encode IO address space (port I/O, aka PIO; in
/ out
instructions on x86) vs. memory address space (normal load/store to special addresses). And ioread*
functions actually check that and dispatch accordingly.
/* * Read/write from/to an (offsettable) iomem cookie. It might be a PIO * access or a MMIO access, these functions don't care. The info is * encoded in the hardware mapping set up by the mapping functions * (or the cookie itself, depending on implementation and hw). * * The generic routines don't assume any hardware mappings, and just * encode the PIO/MMIO as part of the cookie. They coldly assume that * the MMIO IO mappings are not in the low address range. * * Architectures for which this is not true can't use this generic * implementation and should do their own copy. */
For example implementation, here's ioread16
. (IO_COND is a macro that checks the address against a predefined constant: low addresses are PIO addresses).
unsigned int ioread16(void __iomem *addr) { IO_COND(addr, return inw(port), return readw(addr)); return 0xffff; }
What would break if you just cast the ioremap
result to volatile uint32_t*
?
e.g. if you used READ_ONCE
/ WRITE_ONCE
which just cast to volatile unsigned char*
or whatever, and are used for atomic access to shared variables. (In Linux's hand-rolled volatile + inline asm implementation of atomics which it uses instead of C11 _Atomic
).
That might actually work on some little-endian ISAs like x86 if compile-time reordering wasn't a problem, but others need more barriers. If you look at the definition of readl
(which ioread32
uses for MMIO, as opposed to inl
for PIO), it uses barriers around a dereference of a volatile
pointer.
(This and the macros this uses are defined in the same io.h
as this, or you can navigate using the LXR links: every identifier is a hyperlink.)
static inline u32 readl(const volatile void __iomem *addr) {
u32 val;
__io_br();
val = __le32_to_cpu(__raw_readl(addr));
__io_ar(val);
return val;
}
The generic __raw_readl
is just the volatile dereference; some ISAs may provide their own.
__io_ar()
uses rmb()
or barrier()
After Read. /* prevent prefetching of coherent DMA data ahead of a dma-complete */
. The Before Read barrier is just barrier()
- blocking compile-time reordering without asm instructions.
Old answer to the wrong question: the text below answers why you need to call ioremap
.
Because it's a physical address and kernel memory isn't identity-mapped (virt = phys) to physical addresses.
And returning a virtual address isn't an option: not all systems have enough virtual address space to even direct-map all of physical address space as a contiguous range of virtual addresses. (But when there is enough space, Linux does do this, e.g. x86-64 Linux's virtual address-space layout is documented in x86_64/mm.txt
Notably 32-bit x86 kernels on systems with more than 1 or 2GB of RAM (depending on how the kernel is configured: 2:2 or 1:3 kernel:user split of virtual address space). With PAE for 36-bit physical address space, a 32-bit x86 kernel can use much more physical memory than it can map at once. (This is pretty horrible and makes life difficult for a kernel: some random blog reposed Linus Torvald's comments about how PAE really really sucks.)
Other ISAs may have this too, and IDK what Alpha does about IO memory when byte accesses are needed; maybe the region of physical address space that maps word loads/stores to byte loads/stores is handled earlier so you request the right physical address. (http://www.tldp.org/HOWTO/Alpha-HOWTO-8.html)
But 32-bit x86 PAE is obviously an ISA that Linux cares a lot about, even quite early in the history of Linux.
Answered By - Peter Cordes Answer Checked By - Terry (WPSolving Volunteer)