Issue
Packaging a folder on a SUSE Linux Enterprise Server 12 SP3 system using GNU tar 1.30 always gives different md5 checksums although the file contents do not change.
I run tar to package my folder that contains a simple text file:
tar cf package.tar folder
Nevertheless, although the content is exactly the same, the resulting tar always has a different md5 (or sha1) checksum:
$> rm -rf package.tar && tar cf package.tar folder && md5sum package.tar
e6383218596fffe118758b46e0edad1d package.tar
$> rm -rf package.tar && tar cf package.tar folder && md5sum package.tar
1c5aa972e5bfa2ec78e63a9b3116e027 package.tar
Because the linux file system seems to deliver files in a random order to tar, I tried using the --sort
option. But the resulting command doesn't change the checksum issue for me. Also tar's --mtime
option does not help here, since the creation dates are exactly the same.
I appreciate any help on this.
Solution
The archives you provided contain pax extended headers. A quick glance at their structure reveals that they differ in these two fields:
- The process ID of the pax process (as part of a name for the extended header in the ustar header block, and consequently the checksum for this ustar header block).
- The atime (access time) in the extended header.
One of the workarounds you can use for reproducible archive creation is to enforce the old unix ustar format (rather than the pax/posix format):
tar --format=ustar -cf package.tar folder
The other choice is to manually set the extended name and delete the atime while preserving the pax format:
tar --format=pax --pax-option=exthdr.name=%d/PaxHeaders/%f,delete=atime -cf package.tar folder
Now the md5sum
should be the same for both archives.
Answered By - DaBler Answer Checked By - Clifford M. (WPSolving Volunteer)