Tuesday, February 22, 2022

[SOLVED] tar package has different checksum for exactly the same content

Issue

Packaging a folder on a SUSE Linux Enterprise Server 12 SP3 system using GNU tar 1.30 always gives different md5 checksums although the file contents do not change.

I run tar to package my folder that contains a simple text file:

tar cf package.tar folder

Nevertheless, although the content is exactly the same, the resulting tar always has a different md5 (or sha1) checksum:

$> rm -rf package.tar && tar cf package.tar folder && md5sum package.tar
e6383218596fffe118758b46e0edad1d  package.tar
$> rm -rf package.tar && tar cf package.tar folder && md5sum package.tar
1c5aa972e5bfa2ec78e63a9b3116e027  package.tar

Because the linux file system seems to deliver files in a random order to tar, I tried using the --sort option. But the resulting command doesn't change the checksum issue for me. Also tar's --mtime option does not help here, since the creation dates are exactly the same.

I appreciate any help on this.


Solution

The archives you provided contain pax extended headers. A quick glance at their structure reveals that they differ in these two fields:

  1. The process ID of the pax process (as part of a name for the extended header in the ustar header block, and consequently the checksum for this ustar header block).
  2. The atime (access time) in the extended header.

One of the workarounds you can use for reproducible archive creation is to enforce the old unix ustar format (rather than the pax/posix format):

tar --format=ustar -cf package.tar folder

The other choice is to manually set the extended name and delete the atime while preserving the pax format:

tar --format=pax --pax-option=exthdr.name=%d/PaxHeaders/%f,delete=atime -cf package.tar folder

Now the md5sum should be the same for both archives.



Answered By - DaBler
Answer Checked By - Clifford M. (WPSolving Volunteer)