Issue
I'm making many small random writes to a mmaped file. I want to ensure consistency, so from time to time I use msync
, but I don't want to keep track of every single small write that I made. In the current Linux kernel implementation, is there a performance penalty for using msync
on the whole file? For example if the file is 100GB, but I only made total of 10MB changes? Is the kernel looping over every page in the range provided for msync
to find the dirty pages to flush or are those kept on some sort of linked list/other structure?
Solution
TL;DR: no it doesn't, kernel structures that hold the needed information are designed to make the operation efficient regardless of range size.
Pages of mappable objects are kept in a radix tree, however the Linux kernel implementation of radix trees has an additional special feature: entries can be marked with up to 3 different marks, and marked entries can be found and iterated on a lot faster. The actual data structure used is called "XArray", you can find more information about it in this LWN article or in Documentation/core-api/xarray.rst
.
Dirty pages have a special mark which can be set (PAGECACHE_TAG_DIRTY
) allowing for them to be quickly found when writeback is needed (e.g. msync
, fsync
, etc). Furthermore, XArrays provide an O(1) mechanism to check whether any entry exists with a given mark, so in the case of pages it can be quickly determined whether a writeback is needed at all even before looking for dirty pages.
In conclusion, you should not incur in a noticeable performance penalty when calling msync
on the entire mapping as opposed to only a smaller range of actually modified pages.
Answered By - Marco Bonelli Answer Checked By - Candace Johnson (WPSolving Volunteer)