Issue
Currently, for getting a string (here: 123456789) into some files in all my buckets I do the following:
gsutil cat -h gs://AAAA/** | grep '123456789' > 20221109.txt
And I get the name of my path file when I match, so it works, but if I do it this way, it will search among all the directories (and I have a thousand directories and thousand files, it makes so much time. I want to filter with a date thanks to the name of the subdirectory, like:
gsutil cat -h gs://AAAA/*2022-11-09*/** | grep '123456789' > 20221109.txt
But it didn't work, and I have no clue how to solve my problem, I read a lot of answers in SO, but I don't find them.
ps: I can't use find with gsutil , so I try to make it with cat and grep with gsutil in a single command line.
Thanks in advance for your help.
Solution
Finally, I managed to get what I wanted, but it was highly illegible. I think it's possible to do better. I'm open to any improvement. Reminder, This solution avoid reading all the directory of a bucket.
1st Step : I manage to get all the paths and the file that match my pattern of the subdirectory (like a date here):
gsutil ls gs://directory1/*2022-11-09*/** > gs_path_files_2022_11_09.txt
After that, I want to make a grep for each file and get in the output the name of the file and the line where I get my match (again in the terminal):
while read -r line; do
gsutil cat "$line" | awk -v l="'Command: gsutil cat $line | awk '/the_string_i_want_to_match_in_my_file/{print ARGV[ARGIND] ":" $0}':" '/the_string_i_want_to_match_in_my_file/{print l $0}' >> results.txt
done < test.txt
and you will get after that the command (and the name of the file ) + the line where you get your match.
Best regards
Answered By - Cass Answer Checked By - Katrina (WPSolving Volunteer)