Issue
I am working with text files containing somewhere a string
ligand_types A C Cl NA OA N SA HD # ligand atom types
I need to check the patterns in the string: A C Cl NA OA N SA HD and in the case of the absence add to the end the missed "SA" and/or "HD".
Sample input:
ligand_types A C Cl NA OA N HD # ligand atom types
ligand_types A C NA OA N SA # ligand atom types
ligand_types A C Cl NA OA N # ligand atom types
Expected output:
ligand_types A C Cl NA OA N HD SA # ligand atom types
ligand_types A C NA OA N SA HD # ligand atom types
ligand_types A C Cl NA OA N HD SA # ligand atom types
May you suggest me sed or awk solution for this task?
For example using SED it may be:
sed -i 's/old-string/new-string-with-additional-patterns/g' my_file
Solution
Update: adding support for end-of-line comments
Here's an awk
idea:
awk -v types="SA HD" '
BEGIN {
typesCount = split(types,typesArr)
}
/^ligand_types / {
if (i = index($0,"#")) {
comment = substr($0,i)
$0 = substr($0,1,i-1)
} else
comment = ""
for (i = 1; i <= typesCount; i++)
typesHash[typesArr[i]]
for (i = 2; i <= NF; i++)
delete typesHash[$i]
for (t in typesHash)
$(NF+1) = t
if (comment)
$(NF+1) = comment
}
1
' my_file > my_file.new
With your examples:
ligand_types A C Cl NA OA N SA HD # ligand atom types
ligand_types A C Cl NA OA N SA HD # ligand atom types
ligand_types A C Cl NA OA N HD # ligand atom types
ligand_types A C Cl NA OA N HD SA # ligand atom types
ligand_types A C NA OA N SA # ligand atom types
ligand_types A C NA OA N SA HD # ligand atom types
ligand_types A C Cl NA OA N # ligand atom types
ligand_types A C Cl NA OA N HD SA # ligand atom types
Answered By - Fravadona Answer Checked By - Dawn Plyler (WPSolving Volunteer)