Sunday, January 28, 2024

[SOLVED] modifying a column containing both string and number

Issue

I have a tab-delimited file as

[ moleculetype ]
; Name nrexcl
AL7 3

[ atoms ]
; nr type resnr resid atom cgnr charge mass
   1 CB   1  AL6 C4      1 -0.1435  12.0110
   2 CB   1  AL6 C5      2 -0.1500  12.0110
   3 CB   1  AL6 C6      3 -0.1500  12.0110
   4 CB   1  AL6 C7      4  0.0825  12.0110
   5 CB   1  AL6 O8      5 -0.1500  12.0110

[ bonds ]
; ai aj fu b0 kb, b0 kb
 16   7 1 0.10930  287014.9  0.10930  287014.9
 15   7 1 0.10930  287014.9  0.10930  287014.9
  7   8 1 0.14180  303937.5  0.14180  303937.5
  7  17 1 0.10930  287014.9  0.10930  287014.9
  8   9 1 0.13550  349343.9  0.13550  349343.9
 20  12 1 0.10190  390836.6  0.10190  390836.6

I want the output as

[ moleculetype ]
; Name nrexcl
AL7 3

[ atoms ]
; nr type resnr resid atom cgnr charge mass
   1 CB   1  AL6 C      1 -0.1435  12.0110
   2 CB   1  AL6 C      2 -0.1500  12.0110
   3 CB   1  AL6 C      3 -0.1500  12.0110
   4 CB   1  AL6 C      4  0.0825  12.0110
   5 CB   1  AL6 O      5 -0.1500  12.0110

[ bonds ]
; ai aj fu b0 kb, b0 kb
 16   7 1 0.10930  287014.9  0.10930  287014.9
 15   7 1 0.10930  287014.9  0.10930  287014.9
  7   8 1 0.14180  303937.5  0.14180  303937.5
  7  17 1 0.10930  287014.9  0.10930  287014.9
  8   9 1 0.13550  349343.9  0.13550  349343.9
 20  12 1 0.10190  390836.6  0.10190  390836.6

where the section under [ atoms ] is modified. The fifth column is modified containing only the strings and not numbers. Please suggest a way out of this.

The problem is that the normal awk function cannot be applied as the fifth column not only contains C6/C7/O8 but also other things as can be seen under [ bonds ]. I have tried with grep and awk as

grep -A307 -P 'atoms' filename | awk -F, 'sub("[0-9]+\s""",$9)' OFS=,

But it is taking the whole file which is not desired.


Solution

This might work for you (GNU sed):

sed -E '/^\[/h;G;/\n\[ atoms \]/{/^;|^$/!{s/(\S)\S*/\1/5}};P;d' file

Make a copy of any line beginning [ in the hold space.

Append the hold space to every line.

If the second line begins [ atoms], process the line, otherwise print the first line and delete the remainder.

If the start of the current line is either ; or empty, print the first line and delete the remainder.

Otherwise, replace the fifth field with its first character.

Print the first line and delete the remainder.



Answered By - potong
Answer Checked By - Timothy Miller (WPSolving Admin)