Saturday, April 9, 2022

[SOLVED] grep -f on files in a zipped folder

Issue

I am performing a recursive fgrep/grep -f search on a zipped up folder using the following command in one of my programs:

The command I am using:

grep -r -i -z -I -f /path/to/pattern/file /home/folder/TestZipFolder.zip

Inside the pattern file is the string "Dog" that I am trying to search for.

In the zipped up folder there are a number of text files containing the string "Dog".

The grep -f command successfully finds the text files containing the string "Dog" in 3 files inside the zipped up folder, but it prints the output all on one line and some strange characters appear at the end i.e PK (as shown below). And when I try and print the output to a file in my program other characters appear on the end such as ^B^T^@

Output from the grep -f command:

TestZipFolder/test.txtThis is a file containing the string DogPKtest1.txtDog, is found again in this file.PKTestZipFolder/another.txtDog is written in this file.PK 

How would I get each of the files where the string "Dog" has been found to print on a new line so they are not all grouped together on one line like they are now? Also where are the "PK" and other strange characters appearing from in the output and how do i prevent them from appearing?

Desired output

TestZipFolder/test.txt:This is a file containing the string Dog
TestZipFolder/test1.txt:Dog, is found again in this file
TestZipFolder/another.txt:Dog is written in this file

Something along these lines, whereby the user is able to see where the string can be found in the file (you actually get the output in this format if you run the grep command on a file that is not a zip file).


Solution

If you need a multiline output, better use zipgrep :

zipgrep -s "pattern" TestZipFolder.zip

the -s is to suppress error messages(optional). This command will print every matched lines along with the file name. If you want to remove the duplicate names, when more than one match is in a file, some other processing must be done using loops/grep or awk or sed.

Actually, zipgrep is a combination egrep and unzip. And its usage is as follows :

zipgrep [egrep_options] pattern file[.zip] [file(s) ...] [-x xfile(s) ...]

so you can pass any egrep options to it.



Answered By - blackSmith
Answer Checked By - Terry (WPSolving Volunteer)