Tuesday, March 15, 2022

[SOLVED] grep obtains pattern from a file but printing not only the whole match word

Issue

I've got file.txt to extract lines containing the exact words listed in check.txt file.

# file.txt
CA1C 2637 green
CA1C-S1 2561 green
CA1C-S2 2371 green

# check.txt
CA1C

I tried

grep -wFf check.txt file.txt

but I'm not getting the desired output, i.e. all the three lines were printed.

Instead, I'd like to get only the first line,

CA1C 2637 green

I searched and found this post being relevant, it's easy to do it when doing only one word matching. But how can I improve my code to let grep obtain patterns from check.txt file and print only the whole word matched lines?

A lot of thanks!


Solution

The man page for grep says the following about the -w switch:

-w, --word-regexp

Select only those lines containing matches that form whole words.  The test is that the matching substring must either be at the beginning of the line, or preceded by a  non-word constituent character.  Similarly, it must be either at the end of the line or followed by a non-word constituent character.  Word-constituent characters are letters, digits, and the underscore.

In your case, all three lines start with "CA1C-", which meets the conditions of being at the beginning of the line, and being followed by a non-word constituent character (the hyphen).

I would do this with a loop, reading lines manually from check.txt:

cat check.txt | while read line; do grep "^$line " file.txt; done
CA1C 2637 green

This loop reads the lines from check.txt, and searches for each one at the start of a line in file.txt, with a following space.

There may be a better way to do this, but I couldn't get -f to actually consider whitespace at the end of a line of the input file.



Answered By - JediWombat
Answer Checked By - Mary Flores (WPSolving Volunteer)