Issue
I am working with post-processing of the log file arranged in the following format:
Finding intramodel H-bonds
Constraints relaxed by 0.55 angstroms and 20 degrees
Models used:
1.1 SarsCov2_structure31R_nsp5holo_rep1.pdb
1.6 SarsCov2_structure31R_nsp5holo_rep1.pdb
1.10 SarsCov2_structure31R_nsp5holo_rep1.pdb
1.8 SarsCov2_structure31R_nsp5holo_rep1.pdb
1.2 SarsCov2_structure31R_nsp5holo_rep1.pdb
1.3 SarsCov2_structure31R_nsp5holo_rep1.pdb
1.4 SarsCov2_structure31R_nsp5holo_rep1.pdb
1.7 SarsCov2_structure31R_nsp5holo_rep1.pdb
1.5 SarsCov2_structure31R_nsp5holo_rep1.pdb
1.9 SarsCov2_structure31R_nsp5holo_rep1.pdb
6 H-bonds
H-bonds (donor, acceptor, hydrogen, D..A dist, D-H..A dist):
SarsCov2_structure31R_nsp5holo_rep1.pdb #1.3/? ASN 142 ND2 SarsCov2_structure31R_nsp5holo_rep1.pdb #1.3/A UNL 1 N SarsCov2_structure31R_nsp5holo_rep1.pdb #1.3/? ASN 142 2HD2 3.419 2.541
SarsCov2_structure31R_nsp5holo_rep1.pdb #1.5/? GLN 189 NE2 SarsCov2_structure31R_nsp5holo_rep1.pdb #1.5/A UNL 1 O SarsCov2_structure31R_nsp5holo_rep1.pdb #1.5/? GLN 189 1HE2 2.883 2.159
SarsCov2_structure31R_nsp5holo_rep1.pdb #1.6/? HIS 163 NE2 SarsCov2_structure31R_nsp5holo_rep1.pdb #1.6/A UNL 1 O no hydrogen
From this log I need to take all the lines after the 3rd line, and then delete all dublicated patterns "SarsCov2_structure31R_nsp5holo_rep1.pdb". May I use some regex with sed to detect any phrase matching such patter in the log ( which ends with *.pdb) that should be removed automatically for each processed log? So the expected output should be:
Models used:
1.1
1.6
1.10
1.8
1.2
1.3
1.4
1.7
1.5
1.9
6 H-bonds
H-bonds (donor, acceptor, hydrogen, D..A dist, D-H..A dist):
#1.3/? ASN 142 ND2 #1.3/A UNL 1 N #1.3/? ASN 142 2HD2 3.419 2.541
#1.5/? GLN 189 NE2 #1.5/A UNL 1 O #1.5/? GLN 189 1HE2 2.883 2.159
#1.6/? HIS 163 NE2 #1.6/A UNL 1 O no hydrogen 3.299 N/A
#1.7/? GLN 189 NE2 #1.7/A UNL 1 O #1.7/? GLN 189 1HE2 3.109 2.147
#1.9/? ASN 142 ND2 #1.9/A UNL 1 O #1.9/? ASN 142 1HD2 3.032 2.319
#1.10/? GLN 189 NE2 #1.10/A UNL 1 O #1.10/? GLN 189 1HE2 3.054 2.125
Here is some example without regex, which does not work yet :-)
cat test.log | tail -n +2 | sed -e "/SarsCov2_structure31R_nsp5holo_rep1.pdb/d" >> ./test2.log
Solution
Using sed
$ sed 's/[[:alnum:]_]*\.pdb//g;1,2d' input_file
Models used:
1.1
1.6
1.10
1.8
1.2
1.3
1.4
1.7
1.5
1.9
6 H-bonds
H-bonds (donor, acceptor, hydrogen, D..A dist, D-H..A dist):
#1.3/? ASN 142 ND2 #1.3/A UNL 1 N #1.3/? ASN 142 2HD2 3.419 2.541
#1.5/? GLN 189 NE2 #1.5/A UNL 1 O #1.5/? GLN 189 1HE2 2.883 2.159
#1.6/? HIS 163 NE2 #1.6/A UNL 1 O no hydrogen
Answered By - HatLess Answer Checked By - David Goodson (WPSolving Volunteer)