Saturday, October 29, 2022

[SOLVED] how to format the result on several lines when using grep in a for loop

Issue

In several files, I would like to extract the lines (with their number)

  • which contain the ClNonZ pattern
  • and which have the value "real" as first attribute.

for a unitary file, I get the line feed respect.

but I have several files, so I make a "for" loop, and then the multiple occurrences of a file are presented without linefeed

Exemple :

$ cat foo1.txt
A TEST 0.959660297 0 0.021231423 -0.0073 -0.0031 MhZisp
B REAL 0.180467091 0.800424628 0 0.0566 0.0103 ClNonZ
C REAL 0.98089172 0 0 -0.0158 0.0124 MhNonZ
D TEST 0.704883227 0.265392781 0.010615711 -0.0087 -0.0092 MhZisp
E REAL 0.010615711 0.959660297 0.010615711 0.0476 0.0061 ClNonZ
F TEST 0.704883227 0.265392781 0.010458211 0.0865 0.0548 ClNonZ

$ cat foo2.txt
A TEST 0.715498938 0 0.265392781 -0.0013 -0.0309 Unkn
B REAL 0.927813163 0 0.053078556 -0.0051 -0.0636 MhZisp
C TEST 0.55626327 0.222929936 0.201698514 0.0053 -0.0438 MhZisp
D REAL 0.492569002 0.350318471 0.138004246 0.0485 0.0088 ClNonZ
E REAL 0.704883227 0.265392781 0.010615711 0.0476 0.0061 AbbbbZ
F REAL 0.180467091 0.800424628 0 0.0566    0.0103  ClNonZ

grep without loop : result ok for me, with line break :

$  grep -n ClNonZ foo1.txt  | awk '$2 == "REAL" {print $0}'

2:B REAL 0.180467091 0.800424628 0 0.0566 0.0103 ClNonZ
5:E REAL 0.010615711 0.959660297 0.010615711 0.0476 0.0061 ClNonZ

grep in a for loop : bad presentation, line breaks have disappeared :

$  for file in `ls foo*` ; do line=`grep -n ClNonZ $file | awk '$2 == "REAL" {print $0}' `; if [[ -n "$line" ]]; then  echo $file ; echo $line ; echo " " ; fi ; done

foo1.txt
2:B REAL 0.180467091 0.800424628 0 0.0566 0.0103 ClNonZ 5:E REAL 0.010615711 0.959660297 0.010615711 0.0476 0.0061 ClNonZ
 
foo2.txt
4:D REAL 0.492569002 0.350318471 0.138004246 0.0485 0.0088 ClNonZ 6:F REAL 0.180467091 0.800424628 0 0.0566 0.0103 ClNonZ

I tried to used "while" instead of "for" (as explained in http://mywiki.wooledge.org/BashFAQ/001 as suggested by @chepner) without success.

would you have an idea that could help me, please ?


Solution

The primary problem here is that you didn't double-quote your variable references, especially in echo $line (should be echo "$line"). This often causes problems like this. See "I just assigned a variable, but echo $variable shows something else" and "When should I double-quote a parameter expansion?" (short answer: almost always).

Shellcheck.net is good at pointing out common mistakes like this, and will also have some other good recommendations for your code. I recommended using it!

However, in this case, I'd be tempted to replace the entire bash+grep+awk thing, since awk can do it all itself:

awk 'FNR==1 {needheader=1}; ($0 ~ /ClNonZ/ && $2 == "REAL") {if (needheader) {print ""; print FILENAME; needheader=0}; print}' foo*.txt

Explanation:

  1. FNR==1 {needheader=1} -- this triggers at the beginning of each file (FNR is the line number within the current file, so if it's 1 this is the beginning of a file) and sets a variable saying that if there's a match, the filename needs to be printed.
  2. ($0 ~ /ClNonZ/ && $2 == "REAL") -- if "ClNonZ" appears in the line, and the second field is "REAL", then do the following stuff in { }. Note: do you actually want to search the entire line for "ClNonZ", or just the last field? If it's just the last field, use $NF == "ClNonZ")
  3. if (needheader) {print ""; print FILENAME; needheader=0} -- if this is the first match within this file, print a blank line and the filename, then clear the variable that says this stuff needs to be printed.
  4. print -- ...and print the line. Note that $0 is implicit here, and since this is still in the { } from step 2, it only happens if the line matched.
  5. foo*.txt -- just pass all the matching filenames to awk as arguments, and let it scan over all of them in a big batch.


Answered By - Gordon Davisson
Answer Checked By - Marilyn (WPSolving Volunteer)