Thursday, April 7, 2022

[SOLVED] awk: processing log and search pattern

Issue

I am working with the log filles arranged in the following format:

fÆ’dfFinding intramodel H-bonds
Constraints relaxed by 0.5 angstroms and 20 degrees
Models used:
    1.1 SarsCov2_structure49R_nsp5holo_rep1.pdb
    1.2 SarsCov2_structure49R_nsp5holo_rep1.pdb
    1.3 SarsCov2_structure49R_nsp5holo_rep1.pdb
    1.4 SarsCov2_structure49R_nsp5holo_rep1.pdb
    1.5 SarsCov2_structure49R_nsp5holo_rep1.pdb
    1.6 SarsCov2_structure49R_nsp5holo_rep1.pdb
    1.7 SarsCov2_structure49R_nsp5holo_rep1.pdb
    1.8 SarsCov2_structure49R_nsp5holo_rep1.pdb
    1.9 SarsCov2_structure49R_nsp5holo_rep1.pdb
    1.10 SarsCov2_structure49R_nsp5holo_rep1.pdb
    1.11 SarsCov2_structure49R_nsp5holo_rep1.pdb
    1.12 SarsCov2_structure49R_nsp5holo_rep1.pdb
    1.13 SarsCov2_structure49R_nsp5holo_rep1.pdb
    1.14 SarsCov2_structure49R_nsp5holo_rep1.pdb

14 H-bonds
H-bonds (donor, acceptor, hydrogen, D..A dist, D-H..A dist):
SarsCov2_structure49R_nsp5holo_rep1.pdb #1.1/? ASN 142 ND2   SarsCov2_structure49R_nsp5holo_rep1.pdb #1.1/A UNL 888 O   SarsCov2_structure49R_nsp5holo_rep1.pdb #1.1/? ASN 142 1HD2   3.102  2.145
SarsCov2_structure49R_nsp5holo_rep1.pdb #1.3/? GLU 166 N     SarsCov2_structure49R_nsp5holo_rep1.pdb #1.3/A UNL 888 O   SarsCov2_structure49R_nsp5holo_rep1.pdb #1.3/? GLU 166 H      3.011  2.024
SarsCov2_structure49R_nsp5holo_rep1.pdb #1.4/? GLU 166 N     SarsCov2_structure49R_nsp5holo_rep1.pdb #1.4/A UNL 888 O   SarsCov2_structure49R_nsp5holo_rep1.pdb #1.4/? GLU 166 H      3.037  2.132
SarsCov2_structure49R_nsp5holo_rep1.pdb #1.5/? HIS 163 NE2   SarsCov2_structure49R_nsp5holo_rep1.pdb #1.5/A UNL 888 O   no hydrogen                                                   3.388  N/A
SarsCov2_structure49R_nsp5holo_rep1.pdb #1.5/? GLU 166 N     SarsCov2_structure49R_nsp5holo_rep1.pdb #1.5/A UNL 888 O   SarsCov2_structure49R_nsp5holo_rep1.pdb #1.5/? GLU 166 H      2.806  1.792
SarsCov2_structure49R_nsp5holo_rep1.pdb #1.7/? THR 26 N      SarsCov2_structure49R_nsp5holo_rep1.pdb #1.7/A UNL 888 O   SarsCov2_structure49R_nsp5holo_rep1.pdb #1.7/? THR 26 H       3.093  2.142
SarsCov2_structure49R_nsp5holo_rep1.pdb #1.7/? GLY 143 N     SarsCov2_structure49R_nsp5holo_rep1.pdb #1.7/A UNL 888 O   SarsCov2_structure49R_nsp5holo_rep1.pdb #1.7/? GLY 143 H      3.030  2.193
SarsCov2_structure49R_nsp5holo_rep1.pdb #1.9/? GLN 189 NE2   SarsCov2_structure49R_nsp5holo_rep1.pdb #1.9/A UNL 888 O   SarsCov2_structure49R_nsp5holo_rep1.pdb #1.9/? GLN 189 2HE2   3.052  2.301
SarsCov2_structure49R_nsp5holo_rep1.pdb #1.10/? GLU 166 N    SarsCov2_structure49R_nsp5holo_rep1.pdb #1.10/A UNL 888 O  SarsCov2_structure49R_nsp5holo_rep1.pdb #1.10/? GLU 166 H     2.854  1.868
SarsCov2_structure49R_nsp5holo_rep1.pdb #1.12/? GLY 143 N    SarsCov2_structure49R_nsp5holo_rep1.pdb #1.12/A UNL 888 O  SarsCov2_structure49R_nsp5holo_rep1.pdb #1.12/? GLY 143 H     3.103  2.070
SarsCov2_structure49R_nsp5holo_rep1.pdb #1.13/? GLY 143 N    SarsCov2_structure49R_nsp5holo_rep1.pdb #1.13/A UNL 888 O  SarsCov2_structure49R_nsp5holo_rep1.pdb #1.13/? GLY 143 H     3.161  2.224
SarsCov2_structure49R_nsp5holo_rep1.pdb #1.13/? CYS 145 SG   SarsCov2_structure49R_nsp5holo_rep1.pdb #1.13/A UNL 888 O  SarsCov2_structure49R_nsp5holo_rep1.pdb #1.13/? CYS 145 HG    3.421  2.842
SarsCov2_structure49R_nsp5holo_rep1.pdb #1.14/? ASN 142 ND2  SarsCov2_structure49R_nsp5holo_rep1.pdb #1.14/A UNL 888 O  SarsCov2_structure49R_nsp5holo_rep1.pdb #1.14/? ASN 142 2HD2  3.055  2.465
SarsCov2_structure49R_nsp5holo_rep1.pdb #1.14/? CYS 145 N    SarsCov2_structure49R_nsp5holo_rep1.pdb #1.14/A UNL 888 O  SarsCov2_structure49R_nsp5holo_rep1.pdb #1.14/? CYS 145 H     2.924  2.143

I need to find the first occurence of the "GLU 166 N" pattern and print the number present on the same line just before the pattern as #1.number/?, associated with this pattern. So in the example the detected number should be 3 (since the associating number is #1.3/?).

I would start from basic pattern-detection

awk '/GLU 166 N/' file

but how to find correctly the number defined just before the pattern and print it as output ? Finally, in the case if the pattern can not be found, I would like that the script prints 1.


Solution

$ awk -vn=1 '/GLU 166 N/ {gsub(/.*\.|\/\?/,"",$2); n=$2; exit} END {print n}' file
3
$ awk -vn=1 '/GLU 166 N/ {gsub(/.*\.|\/\?/,"",$2); n=$2; exit} END {print n}' /dev/null
1

What you look for is in the second field ($2). gsub(/.*\.|\/\?/,"",$2) replaces in $2 all leading characters up to (and including) the period, and the trailing /? by the empty string.



Answered By - Renaud Pacalet
Answer Checked By - Katrina (WPSolving Volunteer)