Issue
How can I find a pattern in one file that doesn't match any line of another file
I'm aware that grep has a -f option, so instead of feeding grep a pattern, I can feed it a file of patterns.
(a.a is my main file)
user@system:~/test# cat a.a
Were Alexander-ZBn1gozZoEM.mp4
Will Ate-vP-2ahd8pHY.mp4
(p.p is my file of patterns)
user@system:~/test# cat p.p
ZBn1gozZoEM
0maL4cQ8zuU
vP-2ahd8pHY
So the command might be something like
somekindofgrep p.p a.a
but it should give 0maL4cQ8zuU
which is the pattern in the file of patterns, p.p, that doesn't match anything in the file a.a
I am not sure what command to do.
$grep -f p.p a.a<ENTER>
Were Alexander-ZBn1gozZoEM.mp4
Will Ate-vP-2ahd8pHY.mp4
$
I know that if there was an additional line in a.a not matched by any pattern in p.p, then grep -f p.p a.a
won't show it. And if I do grep -v -f p.p a.a
then it'd only show that line of a.a, not matched in p.p
But i'm interested in finding what pattern in (my file of patterns) p.p doesn't match a.a!
I looked at Make grep print missing queries but he wants everything from both files. And also, one of the answers there mentions -v but I can't quite see that applying to my case because -v shows the lines of a file that don't match any pattern. So having or not having -v won't help me there, because i'm looking for a pattern that doesn't match any line of a file.
Solution
Here's a possible solution based on one possible interpretation of what it is you're trying to do (a full-string match on the lines in p.p
against the substrings between the first -
and the last .
in the lines in a.a
):
$ awk '
NR==FNR {
sub(/[^-]*-/,"")
sub(/\.[^.]*$/,"")
file1[$0]
next
}
!($0 in file1)
' a.a p.p
0maL4cQ8zuU
The above will work robustly, portably, and efficiently using any awk in any shell on every Unix box. It'll run orders of magnitude faster than the current shell loop answer, faster than the existing awk answer or the xargs answer, and will work no matter which characters exist in either file, regexp metachars included, and whether or not the search strings from p.p
exist as substrings or in other contexts in a.a
. It also will have zero security concerns no matter what is in the input files.
Answered By - Ed Morton Answer Checked By - Robin (WPSolving Admin)