Issue
I have a .tar file that contains many .gz files inside a folder. Each of these gz files contain a .txt file. Other stackoverflow questions related to this problem are aimed at extracting the files.
I am trying to iteratively read the content of each .txt file without extracting them, because the .tar is large.
First I read the contents of the .tar file:
import tarfile
tar = tarfile.open("FILE.tar")
tar.getmembers()
Or in Unix:
tar xvf file.tar -O
Then I tried using the tarfile extractfile method, but I'm getting an error: "module 'tarfile' has no attribute 'extractfile'". Besides, I'm not even sure that is the right method.
import gzip
for member in tar.getmembers():
m = tarfile.extractfile(member)
file_contents = gzip.GzipFile(fileobj=m).read()
If you want to create an example file to simulate the original file:
$ mkdir directory
$ touch directory/file1.txt.gz directory/file2.txt.gz directory/file3.txt.gz
$ tar -c -f file.tar directory
This is the final version that worked for me after using Mark Adler's suggestion:
import tarfile
tar = tarfile.open("file.tar")
members = tar.getmembers()
# Here I append the results in a list, because I wasn't able to
# parse the tarfile type returned by .getmembers():
tar_name = []
for elem in members:
tar_name.append(elem.name)
# Then I changed tarfile.extractfile to tar.extractfile as suggested:
for member in tar_name:
# I'm using this because I have other non-gzs in the directory
if member.endswith(".gz"):
m=tar.extractfile(member)
file_contents = gzip.GzipFile(fileobj=m).read()
Solution
You need to use tar.extractfile(member)
instead of tarfile.extractfile(member)
. tarfile
is the class, and doesn't know about the tar file you opened. tar
is the tarfile object, which references the .tar file you opened.
To do it right, use next()
instead of getmembers()
or getnames()
, so that you don't have to read the entire tar file twice:
with tarfile.open(sys.argv[1]) as tar:
while ent := tar.next():
if ent.name.endswith(".gz"):
print(gzip.GzipFile(fileobj=tar.extractfile(ent)).read())
Answered By - Mark Adler Answer Checked By - Robin (WPSolving Admin)