Issue
I need to find the reports (.docx files), read them with docx2txt
, find the second match of "passed" (excluding "not passed") and save these filenames to text file. Here is what I tried:
OIFS="$IFS"
IFS=$'\n'
for f in $(find . -wholename '*_done/(*Report*.docx' |grep -v appendix)
do
docx2txt "$f" - | (grep -q -m2 passed || grep -q -v "not passed") || echo $f >> failed
done
IFS="$OIFS"
But this script gives me an empty file. If I replace ||
to &&
before echo
, all filenames are stored into the file. grep
works fine if it is not in the script, as well as docx2txt
. What am I doing wrong here?
Solution
There are quite a lot problems with the grep commands.
grep -q
always exits successfully on the first match.
With
-q
the-m2
has no effect. If there is one matchgrep
exits successfully. It does not check if there is a second match.
To check that there are (at least) two matches, count the matches and then usetest
/[ ]
to check the number of found matches. If there is at most onepassed
per line,grep -c
is sufficient. If there can be multiple matches per line, you needgrep -o ... | wc -l
.-q
and-v
together means: Is there at least one line that does not contain the pattern? Whengrep
finds such a line it exits successfully. The only way for this command to fail is an input in which every line containsnot passed
(this includes the empty file).
Matchingpassed
but notnot passed
is trickier than one might suspect. If there can be at most onepassed
/not passed
per line, you can usegrep -v 'not passed' | grep passed
. Otherwise you need a need negative lookbehind, which is only available in perl compatible regular expressions (PCRE).
In addition to that command | (grep ... || grep ...)
might not do what you expect. command
produces output only once. After the first grep
read some of this output, that read part is gone. The second grep
will then continue reading where the first grep
stopped.
BTW: for … in $(find … | grep -v …)
can be turned into a single, safe find
command using -not
and -exec
.
Solution
If each line contains at most one passed
/not passed
, use
find . -wholename '*_done/(*Report*.docx' -not -wholename '*appendix*' \
-exec sh -c '[ $(docx2txt "$0" - | grep -v "not passed" | grep -cm2 passed) = 2 ]' {} \; -print
If there can be multiple passed
/not passed
per line, you need GNU grep
or pcregrep
:
find . -wholename '*_done/(*Report*.docx' -not -wholename '*appendix*' \
-exec sh -c '[ $(docx2txt "$0" - | grep -Pom2 "(?<!not )passed" | wc -l) = 2 ]' {} \; -print
Answered By - Socowi Answer Checked By - Senaida (WPSolving Volunteer)