Issue
i am dealing with the log consisted of many lines in the following format:
06I: 31 H-bonds
H-bonds (donor, acceptor, hydrogen, D..A dist, D-H..A dist):
#1.1/? THR 26 N #1.1/A UNL 1 O #1.1/? THR 26 H 3.515 2.716
#1.1/? ASN 142 ND2 #1.1/A UNL 1 O #1.1/? ASN 142 2HD2 3.227 2.305
#1.1/A UNL 1 N #1.1/? THR 26 O #1.1/A UNL 1 H 3.463 2.652
#1.2/A UNL 1 N #1.2/? PHE 140 O #1.2/A UNL 1 H 2.987 2.200
#1.4/? THR 26 N #1.4/A UNL 1 S #1.4/? THR 26 H 4.354 3.371
#1.4/? HIS 163 NE2 #1.4/A UNL 1 N no hydrogen 3.137 N/A
#1.4/A UNL 1 N #1.4/? ARG 188 O #1.4/A UNL 1 H 3.000 2.081
#1.5/? HIS 163 NE2 #1.5/A UNL 1 N no hydrogen 3.330 N/A
#1.5/? GLN 189 NE2 #1.5/A UNL 1 O #1.5/? GLN 189 2HE2 3.029 2.132
#1.6/A UNL 1 N #1.6/? ARG 188 O #1.6/A UNL 1 H 2.984 2.064
#1.8/? ASN 142 ND2 #1.8/A UNL 1 N #1.8/? ASN 142 2HD2 3.164 2.395
#1.8/? ASN 142 ND2 #1.8/A UNL 1 O #1.8/? ASN 142 2HD2 3.031 2.180
#1.8/? GLN 189 NE2 #1.8/A UNL 1 O #1.8/? GLN 189 1HE2 3.276 2.553
#1.8/A UNL 1 N #1.8/? THR 190 O #1.8/A UNL 1 H 3.257 2.407
#1.9/A UNL 1 N #1.9/? THR 190 O #1.9/A UNL 1 H 2.913 2.037
#1.10/? SER 144 OG #1.10/A UNL 1 S #1.10/? SER 144 HG 4.246 3.845
#1.10/? HIS 163 NE2 #1.10/A UNL 1 S no hydrogen 3.700 N/A
#1.10/A UNL 1 N #1.10/? THR 190 O #1.10/A UNL 1 H 3.008 2.091
#1.12/? GLN 189 NE2 #1.12/A UNL 1 O #1.12/? GLN 189 1HE2 2.929 2.152
#1.12/A UNL 1 N #1.12/? PHE 140 O #1.12/A UNL 1 H 2.912 2.012
#1.13/? ASN 142 ND2 #1.13/A UNL 1 O #1.13/? ASN 142 2HD2 3.063 2.291
#1.14/? HIS 41 NE2 #1.14/A UNL 1 S no hydrogen 3.919 N/A
#1.14/? ASN 142 ND2 #1.14/A UNL 1 O #1.14/? ASN 142 2HD2 2.802 1.872
#1.14/A UNL 1 N #1.14/? THR 190 O #1.14/A UNL 1 H 2.927 1.987
#1.16/? GLN 189 NE2 #1.16/A UNL 1 N #1.16/? GLN 189 1HE2 3.456 2.669
#1.16/? GLN 189 NE2 #1.16/A UNL 1 O #1.16/? GLN 189 1HE2 3.079 2.177
#1.16/A UNL 1 N #1.16/? THR 190 O #1.16/A UNL 1 H 2.967 1.987
#1.17/? ASN 142 ND2 #1.17/A UNL 1 N #1.17/? ASN 142 2HD2 3.218 2.294
#1.17/A UNL 1 N #1.17/? THR 190 O #1.17/A UNL 1 H 3.364 2.469
#1.18/? ASN 142 ND2 #1.18/A UNL 1 O #1.18/? ASN 142 2HD2 3.117 2.142
#1.20/? ASN 142 ND2 #1.20/A UNL 1 N #1.20/? ASN 142 2HD2 3.245 2.560
-----------------------------------------------------------------------------
structure30R: 21 H-bonds
H-bonds (donor, acceptor, hydrogen, D..A dist, D-H..A dist):
#1.4/? GLN 189 NE2 #1.4/A UNL 1 O #1.4/? GLN 189 1HE2 3.139 2.374
#1.5/? GLN 189 NE2 #1.5/A UNL 1 N #1.5/? GLN 189 2HE2 3.296 2.365
#1.7/? CYS 145 SG #1.7/A UNL 1 O #1.7/? CYS 145 HG 3.466 2.762
#1.7/A UNL 1 O #1.7/? LEU 141 O #1.7/A UNL 1 H 2.951 2.048
#1.8/? ASN 142 ND2 #1.8/A UNL 1 O #1.8/? ASN 142 2HD2 3.660 3.073
#1.8/? ASN 142 ND2 #1.8/A UNL 1 O #1.8/? ASN 142 1HD2 2.965 2.162
#1.8/? CYS 145 SG #1.8/A UNL 1 O #1.8/? CYS 145 HG 3.480 2.556
#1.9/? HIS 163 NE2 #1.9/A UNL 1 O no hydrogen 3.272 N/A
#1.9/A UNL 1 O #1.9/? GLN 189 OE1 #1.9/A UNL 1 H 2.915 2.341
#1.10/? ASN 142 ND2 #1.10/A UNL 1 O #1.10/? ASN 142 2HD2 3.100 2.185
#1.10/? GLN 189 NE2 #1.10/A UNL 1 O #1.10/? GLN 189 1HE2 3.180 2.408
#1.10/A UNL 1 O #1.10/? GLU 166 O #1.10/A UNL 1 H 3.246 2.639
#1.11/? ASN 142 ND2 #1.11/A UNL 1 O #1.11/? ASN 142 2HD2 3.122 2.204
#1.11/? HIS 163 NE2 #1.11/A UNL 1 O no hydrogen 3.313 N/A
as you may see some lines (which consist of the pattern "no hydrogen" + some number os spaces) are out of the format where the last two numbers are significantly shifted e.g. no hydrogen 3.137 N/A
Since the number of the spaces between these elements may be different I could not find a simple expression using sed to remove all of those useless spaces e.g.
sed -e "s/no hydrogen //g"
will match only for a partcilar line. may you suggest me some regular expressiion which can be used with sed to match all the lines consisted of "no hydrogen" and remove the unused spaces?
Here is the expected output:
06I: 31 H-bonds
H-bonds (donor, acceptor, hydrogen, D..A dist, D-H..A dist):
#1.1/? THR 26 N #1.1/A UNL 1 O #1.1/? THR 26 H 3.515 2.716
#1.1/? ASN 142 ND2 #1.1/A UNL 1 O #1.1/? ASN 142 2HD2 3.227 2.305
#1.1/A UNL 1 N #1.1/? THR 26 O #1.1/A UNL 1 H 3.463 2.652
#1.2/A UNL 1 N #1.2/? PHE 140 O #1.2/A UNL 1 H 2.987 2.200
#1.4/? THR 26 N #1.4/A UNL 1 S #1.4/? THR 26 H 4.354 3.371
#1.4/? HIS 163 NE2 #1.4/A UNL 1 N no hydrogen 3.137 N/A
#1.4/A UNL 1 N #1.4/? ARG 188 O #1.4/A UNL 1 H 3.000 2.081
#1.5/? HIS 163 NE2 #1.5/A UNL 1 N no hydrogen 3.330 N/A
#1.5/? GLN 189 NE2 #1.5/A UNL 1 O #1.5/? GLN 189 2HE2 3.029 2.132
#1.6/A UNL 1 N #1.6/? ARG 188 O #1.6/A UNL 1 H 2.984 2.064
#1.8/? ASN 142 ND2 #1.8/A UNL 1 N #1.8/? ASN 142 2HD2 3.164 2.395
#1.8/? ASN 142 ND2 #1.8/A UNL 1 O #1.8/? ASN 142 2HD2 3.031 2.180
#1.8/? GLN 189 NE2 #1.8/A UNL 1 O #1.8/? GLN 189 1HE2 3.276 2.553
#1.8/A UNL 1 N #1.8/? THR 190 O #1.8/A UNL 1 H 3.257 2.407
#1.9/A UNL 1 N #1.9/? THR 190 O #1.9/A UNL 1 H 2.913 2.037
#1.10/? SER 144 OG #1.10/A UNL 1 S #1.10/? SER 144 HG 4.246 3.845
#1.10/? HIS 163 NE2 #1.10/A UNL 1 S no hydrogen 3.700 N/A
#1.10/A UNL 1 N #1.10/? THR 190 O #1.10/A UNL 1 H 3.008 2.091
#1.12/? GLN 189 NE2 #1.12/A UNL 1 O #1.12/? GLN 189 1HE2 2.929 2.152
#1.12/A UNL 1 N #1.12/? PHE 140 O #1.12/A UNL 1 H 2.912 2.012
#1.13/? ASN 142 ND2 #1.13/A UNL 1 O #1.13/? ASN 142 2HD2 3.063 2.291
#1.14/? HIS 41 NE2 #1.14/A UNL 1 S no hydrogen 3.919 N/A
#1.14/? ASN 142 ND2 #1.14/A UNL 1 O #1.14/? ASN 142 2HD2 2.802 1.872
#1.14/A UNL 1 N #1.14/? THR 190 O #1.14/A UNL 1 H 2.927 1.987
#1.16/? GLN 189 NE2 #1.16/A UNL 1 N #1.16/? GLN 189 1HE2 3.456 2.669
#1.16/? GLN 189 NE2 #1.16/A UNL 1 O #1.16/? GLN 189 1HE2 3.079 2.177
#1.16/A UNL 1 N #1.16/? THR 190 O #1.16/A UNL 1 H 2.967 1.987
#1.17/? ASN 142 ND2 #1.17/A UNL 1 N #1.17/? ASN 142 2HD2 3.218 2.294
#1.17/A UNL 1 N #1.17/? THR 190 O #1.17/A UNL 1 H 3.364 2.469
#1.18/? ASN 142 ND2 #1.18/A UNL 1 O #1.18/? ASN 142 2HD2 3.117 2.142
#1.20/? ASN 142 ND2 #1.20/A UNL 1 N #1.20/? ASN 142 2HD2 3.245 2.560
-----------------------------------------------------------------------------
structure30R: 21 H-bonds
H-bonds (donor, acceptor, hydrogen, D..A dist, D-H..A dist):
#1.4/? GLN 189 NE2 #1.4/A UNL 1 O #1.4/? GLN 189 1HE2 3.139 2.374
#1.5/? GLN 189 NE2 #1.5/A UNL 1 N #1.5/? GLN 189 2HE2 3.296 2.365
#1.7/? CYS 145 SG #1.7/A UNL 1 O #1.7/? CYS 145 HG 3.466 2.762
#1.7/A UNL 1 O #1.7/? LEU 141 O #1.7/A UNL 1 H 2.951 2.048
#1.8/? ASN 142 ND2 #1.8/A UNL 1 O #1.8/? ASN 142 2HD2 3.660 3.073
#1.8/? ASN 142 ND2 #1.8/A UNL 1 O #1.8/? ASN 142 1HD2 2.965 2.162
#1.8/? CYS 145 SG #1.8/A UNL 1 O #1.8/? CYS 145 HG 3.480 2.556
#1.9/? HIS 163 NE2 #1.9/A UNL 1 O no hydrogen 3.272 N/A
#1.9/A UNL 1 O #1.9/? GLN 189 OE1 #1.9/A UNL 1 H 2.915 2.341
#1.10/? ASN 142 ND2 #1.10/A UNL 1 O #1.10/? ASN 142 2HD2 3.100 2.185
#1.10/? GLN 189 NE2 #1.10/A UNL 1 O #1.10/? GLN 189 1HE2 3.180 2.408
#1.10/A UNL 1 O #1.10/? GLU 166 O #1.10/A UNL 1 H 3.246 2.639
#1.11/? ASN 142 ND2 #1.11/A UNL 1 O #1.11/? ASN 142 2HD2 3.122 2.204
#1.11/? HIS 163 NE2 #1.11/A UNL 1 O no hydrogen
Solution
Using sed
$ sed 's/\(no hydrogen \{12\}\)[[:space:]]\+/\1/' input_fie
06I: 31 H-bonds
H-bonds (donor, acceptor, hydrogen, D..A dist, D-H..A dist):
#1.1/? THR 26 N #1.1/A UNL 1 O #1.1/? THR 26 H 3.515 2.716
#1.1/? ASN 142 ND2 #1.1/A UNL 1 O #1.1/? ASN 142 2HD2 3.227 2.305
#1.1/A UNL 1 N #1.1/? THR 26 O #1.1/A UNL 1 H 3.463 2.652
#1.2/A UNL 1 N #1.2/? PHE 140 O #1.2/A UNL 1 H 2.987 2.200
#1.4/? THR 26 N #1.4/A UNL 1 S #1.4/? THR 26 H 4.354 3.371
#1.4/? HIS 163 NE2 #1.4/A UNL 1 N no hydrogen 3.137 N/A
#1.4/A UNL 1 N #1.4/? ARG 188 O #1.4/A UNL 1 H 3.000 2.081
#1.5/? HIS 163 NE2 #1.5/A UNL 1 N no hydrogen 3.330 N/A
#1.5/? GLN 189 NE2 #1.5/A UNL 1 O #1.5/? GLN 189 2HE2 3.029 2.132
#1.6/A UNL 1 N #1.6/? ARG 188 O #1.6/A UNL 1 H 2.984 2.064
#1.8/? ASN 142 ND2 #1.8/A UNL 1 N #1.8/? ASN 142 2HD2 3.164 2.395
#1.8/? ASN 142 ND2 #1.8/A UNL 1 O #1.8/? ASN 142 2HD2 3.031 2.180
#1.8/? GLN 189 NE2 #1.8/A UNL 1 O #1.8/? GLN 189 1HE2 3.276 2.553
#1.8/A UNL 1 N #1.8/? THR 190 O #1.8/A UNL 1 H 3.257 2.407
#1.9/A UNL 1 N #1.9/? THR 190 O #1.9/A UNL 1 H 2.913 2.037
#1.10/? SER 144 OG #1.10/A UNL 1 S #1.10/? SER 144 HG 4.246 3.845
#1.10/? HIS 163 NE2 #1.10/A UNL 1 S no hydrogen 3.700 N/A
#1.10/A UNL 1 N #1.10/? THR 190 O #1.10/A UNL 1 H 3.008 2.091
#1.12/? GLN 189 NE2 #1.12/A UNL 1 O #1.12/? GLN 189 1HE2 2.929 2.152
#1.12/A UNL 1 N #1.12/? PHE 140 O #1.12/A UNL 1 H 2.912 2.012
#1.13/? ASN 142 ND2 #1.13/A UNL 1 O #1.13/? ASN 142 2HD2 3.063 2.291
#1.14/? HIS 41 NE2 #1.14/A UNL 1 S no hydrogen 3.919 N/A
#1.14/? ASN 142 ND2 #1.14/A UNL 1 O #1.14/? ASN 142 2HD2 2.802 1.872
#1.14/A UNL 1 N #1.14/? THR 190 O #1.14/A UNL 1 H 2.927 1.987
#1.16/? GLN 189 NE2 #1.16/A UNL 1 N #1.16/? GLN 189 1HE2 3.456 2.669
#1.16/? GLN 189 NE2 #1.16/A UNL 1 O #1.16/? GLN 189 1HE2 3.079 2.177
#1.16/A UNL 1 N #1.16/? THR 190 O #1.16/A UNL 1 H 2.967 1.987
#1.17/? ASN 142 ND2 #1.17/A UNL 1 N #1.17/? ASN 142 2HD2 3.218 2.294
#1.17/A UNL 1 N #1.17/? THR 190 O #1.17/A UNL 1 H 3.364 2.469
#1.18/? ASN 142 ND2 #1.18/A UNL 1 O #1.18/? ASN 142 2HD2 3.117 2.142
#1.20/? ASN 142 ND2 #1.20/A UNL 1 N #1.20/? ASN 142 2HD2 3.245 2.560
-----------------------------------------------------------------------------
structure30R: 21 H-bonds
H-bonds (donor, acceptor, hydrogen, D..A dist, D-H..A dist):
#1.4/? GLN 189 NE2 #1.4/A UNL 1 O #1.4/? GLN 189 1HE2 3.139 2.374
#1.5/? GLN 189 NE2 #1.5/A UNL 1 N #1.5/? GLN 189 2HE2 3.296 2.365
#1.7/? CYS 145 SG #1.7/A UNL 1 O #1.7/? CYS 145 HG 3.466 2.762
#1.7/A UNL 1 O #1.7/? LEU 141 O #1.7/A UNL 1 H 2.951 2.048
#1.8/? ASN 142 ND2 #1.8/A UNL 1 O #1.8/? ASN 142 2HD2 3.660 3.073
#1.8/? ASN 142 ND2 #1.8/A UNL 1 O #1.8/? ASN 142 1HD2 2.965 2.162
#1.8/? CYS 145 SG #1.8/A UNL 1 O #1.8/? CYS 145 HG 3.480 2.556
#1.9/? HIS 163 NE2 #1.9/A UNL 1 O no hydrogen 3.272 N/A
#1.9/A UNL 1 O #1.9/? GLN 189 OE1 #1.9/A UNL 1 H 2.915 2.341
#1.10/? ASN 142 ND2 #1.10/A UNL 1 O #1.10/? ASN 142 2HD2 3.100 2.185
#1.10/? GLN 189 NE2 #1.10/A UNL 1 O #1.10/? GLN 189 1HE2 3.180 2.408
#1.10/A UNL 1 O #1.10/? GLU 166 O #1.10/A UNL 1 H 3.246 2.639
#1.11/? ASN 142 ND2 #1.11/A UNL 1 O #1.11/? ASN 142 2HD2 3.122 2.204
#1.11/? HIS 163 NE2 #1.11/A UNL 1 O no hydrogen
\(no hydrogen \{12\}\)
- Create a group match within parenthesis (..)
with sed
s back referencing functionality which can later be returned with \1
. The command could also have been written \(no hydrogen[[:space:]]\{12\}\)
to emphasize the presence of a space. This will include 12 spaces after the word no hydrogen
to be returned as a back reference.
[[:space:]]\+
- As this is not part of the group match, it will be excluded. This will match all the remaining spaces after the matched word and 12 spaces we want retained within the group match.
Answered By - HatLess Answer Checked By - Marilyn (WPSolving Volunteer)