Issue
I am writing a program to read and write a file at the same time. More specifically, all write operations are appending new data to the end of the file and all read operations are reading random positions of the file.
I am thinking of creating memory-mapped file (using mmap
) to achieve efficient read while writing via append (mode a
in open
). However, I don't think this will work because the memory-mapped file cannot change in size*, unless I munmap
and then mmap
it.
While "munmap
and then mmap
the file again" works, it has many downsides. Not only I need to perform 2 syscalls after every write (or before every read), which hurts performance, the base address returned from the next mmap
call after munmap
could be different from the previous one. Since I am planning to have other in-memory data structure storing pointers to specific offset of this memory mapped file, it could be very inconvenient.
Are there more elegant and efficient ways of doing this? The program will be mostly running on Linux (but solutions with portability to other POSIX systems are preferred). I have read through the following posts, but none of them seems to give a definitive answer.
How to portably extend a file accessed using mmap()
Can the OS automatically grow an mmap backed file?
My intuition is to use mmap
to "reserve" the file with a size that is large enough to accommodate the growth of file, say a few hundred of GiB (that is a very reasonable assumption in my use case). And then somehow reflect the change of file size in this mapped memory without invalidating it with munmap
. However, I am aware that accessing data beyond the real file boundary could result in a bus error. And the documentation isn't clear about whether changes in file size will get reflected.
*I am not 100% sure about this, but I couldn't find any source of elegantly changing the size of memory-mapped file.
Solution
After some experimentations, I found a way to make it work.
First mmap
the file with PROT_NONE
and a large enough size. For 64-bit systems, it can be as large 1L << 46
(64TB). This does NOT consume physical memory* (at least on Linux). It will consume address space (virtual memory) for this process.
void* ptr = mmap(NULL, (1L << 40), PROT_NONE, MAP_SHARED, fd, 0);
Then, give read (and/or write) permission to the part of memory within file length using mprotect
. Note that size need to be aligned with page size (which can be obtained by sysconf(_SC_PAGESIZE)
, usually 4096).
mprotect(ptr, aligned_size, PROT_READ | PROT_WRITE);
However, if file size is not page-size aligned, reading the portion within mapped region (with PROT_READ
permission) but beyond file length will trigger a bus error, as documented on mmap
manual.
Then you can use either file descriptor fd
or the mapped memory to read and write file. Remember to use fsync
or msync
to persist the data after writing to it. The memory-mapped page with PROT_READ
permission should get the latest file content (if you write to it)**. The newly mapped page with mprotect
will also get the newly updated page.
Depending on the application, you might want to use ftruncate
to make the file size aligned to system page size for the best performance. You might also want to use madvise
with MADV_SEQUENTIAL
to improve performance when reading those pages.
*This behavior is not mentioned on the manual of mmap
. However, since PROT_NONE
implies those pages are not accessible in anyway, it's trivial for any OS implementation to not allocating any physical memory to it at all.
**This behavior of memory region mapped before a file write getting updated after the write is completed (fsync
or msync
) is also not mentioned on the manual (or at least I did not see it). But it seems to be the case at least on recent Linux kernels (4.x onward).
Answered By - lewisxy Answer Checked By - Dawn Plyler (WPSolving Volunteer)