Issue
According the man page of getdents:
d_off
is the distance from the start of the directory to the start of the nextlinux_dirent
.d_reclen
is the size of this entirelinux_dirent
.
So I would expect that if the first entry has d_reclen
n, its d_off
would also be n (and for the i-th entry, d_off
would be the sum of the d_reclen
s of all entries from 0 to i, inclusive).
However, in that same man page, a nicely printed table with the entries of an example directory looks like this:
--------------- nread=120 --------------- inode# file type d_reclen d_off d_name 2 directory 16 12 . 2 directory 16 24 .. 11 directory 24 44 lost+found 12 regular 16 56 a 228929 directory 16 68 sub 16353 directory 16 80 sub2 130817 directory 16 4096 sub3
The d_off
fields of the entries do not seem to follow the rule as I expected.
If the first entry has size 16, surely the offset from the start to the second entry would be 16, but apparently it's actually 12.
So what don't I understand about the d_off
field of linux_dirent64
?
Solution
It's explained vaguely in the manual page, but as you can probably see by compiling and running the example program, your assumption does not hold.
The manual page for readdir(3)
gives a bit more insight:
d_off The value returned in d_off is the same as would be returned by
calling telldir(3) at the current position in the directory
stream. Be aware that despite its type and name, the d_off field
is seldom any kind of directory offset on modern filesystems.
Applications should treat this field as an opaque value, making no
assumptions about its contents; see also telldir(3).
The key part is "the d_off
field is seldom any kind of directory offset on modern filesystems". The d_off
field is a value for internal use by the underlying filesystem, and its meaning is implementation-specific. It does not necessarily have any correlation with d_reclen
, nor does it need to represent an actual "offset" in memory. Whatever software you write, you should not rely on the value of d_off
and consider it like an opaque identifier.
There may be filesystems where d_off
corresponds to an actual offset in bytes between dirents, but this is in general not the case. The field is used more or less like a unique "counter" or "cookie" value to distinguish files inside a directory.
In fact, if you take a look at the values on a Btrfs filesystem, d_off
seems to start at 1
for .
and 2
for ..
, increasing by one for any following dirent
, with the last one having d_off
equal to INT32_MAX
. At least for a directory with fresh newly created files, things will change after deleting/moving/creating more files.
$ mkdir test
$ cd test
$ touch a b c d e f
$ ls -l
total 0
-rw-r----- 1 marco marco 0 gen 15 01:20 a
-rw-r----- 1 marco marco 0 gen 15 01:20 b
-rw-r----- 1 marco marco 0 gen 15 01:20 c
-rw-r----- 1 marco marco 0 gen 15 01:20 d
-rw-r----- 1 marco marco 0 gen 15 01:20 e
-rw-r----- 1 marco marco 0 gen 15 01:20 f
$ ../test_program
--------------- nread=192 ---------------
inode# file type d_reclen d_off d_name
46206659 directory 24 1 .
214242 directory 24 2 ..
46206662 regular 24 3 a
46206663 regular 24 4 b
46206664 regular 24 5 c
46206665 regular 24 6 d
46206666 regular 24 7 e
46206667 regular 24 2147483647 f
This 2004 Sourceware bug report for Glibc by Dan Tsafrir also contains some insightful explanations about d_off
, such as:
In the implementation of
getdents()
, thed_off
field (belonging to the linux kernel'sdirent
structure) is falsely assumed to contain the byte offset to the nextdirent
. Note that the linux manual of thereaddir
system-call states thatd_off
is the "offset to thisdirent
" while glibc'sgetdents
treats it as the offset to the nextdirent
.In practice, both of the above are wrong/misleading. The
d_off
field may contain illegal negative values, 0 (should also never happen as the "next"dirent
's offset must always be bigger then 0), or positive values that are bigger than the size of the directory-file itself:
We're not sure what the Linux kernel intended to place in this field, but our experience shows that on "real" file systems (that actually reside on some disk) the offset seems to be a simple (not necessarily continuous) counter: e.g. first entry may have
d_off=1
, second:d_off=2
, third:d_off=4096
, fourth=d_off=4097
etc. We conjecture this is the serial of thedirent
record within the directory (and so, this is indeed the "offset", but counted in records out of which some were already removed).For file systems that are maintained by the amd automounter (automount, directories) the
d_off
seems to be arbitrary (and may be negative, zero or beyond the scope of a 32bit integer). We conjecture the amd doesn't assign this field and the received values are simply garbage.
Answered By - Marco Bonelli Answer Checked By - Pedro (WPSolving Volunteer)