Issue
I would like to replace a portion of the headers in a fasta file (sourrunded by _) using a text file with a key. Hope someone can help me! Thanks
#fasta file:
>mir-2_scf7180000350313_41896
CCATCAGAGTGGTTGTGATGTGGTGCTATTGATTCATATCACAGCCAGCTTTGATGAG
>mir-92a-2_scf7180000349939_17298
AGGTGGGGATGGGGGCAATATTTGTGAATGATTAAATTCAAATTGCACTTGTCCCGGCCTGC
>mir-279a_scf7180000350374_48557
AATGAGTGGCGGTCTAGTGCACGGTCGATAAAGTTGTGACTAGATCCACACTCATTAAG
#key_file.txt
scf7180000350313 NW_011929472.1
scf7180000349939 NW_011929473.1
scf7180000350374 NW_011929474.1
#expected result
>mir-2_NW_011929472.1_41896
CCATCAGAGTGGTTGTGATGTGGTGCTATTGATTCATATCACAGCCAGCTTTGATGAG
>mir-92a-2_NW_011929473.1_17298
AGGTGGGGATGGGGGCAATATTTGTGAATGATTAAATTCAAATTGCACTTGTCCCGGCCTGC
>mir-279a_NW_011929474.1_48557
AATGAGTGGCGGTCTAGTGCACGGTCGATAAAGTTGTGACTAGATCCACACTCATTAAG
Solution
You can try this awk
.
$ awk '
NR == FNR{r[$1] = $2; next} # read in keyword-replacement file in associative array
/^>/{ # for all lines beginning with >
for(i in r){ # cycle through the key values of the associative array
n = sub(i, r[i], $0) # do the replacement with i (key) and r[i] (value). That changes the line in memory. It's printed using "1" at the end of the block
if(n == 1){break} # a performance-relevant line, assuring the for loop breaks once a key-value pair matched
}
}1' key_file.txt fasta-file
>mir-2_NW_011929472.1_41896
CCATCAGAGTGGTTGTGATGTGGTGCTATTGATTCATATCACAGCCAGCTTTGATGAG
>mir-92a-2_NW_011929473.1_17298
AGGTGGGGATGGGGGCAATATTTGTGAATGATTAAATTCAAATTGCACTTGTCCCGGCCTGC
>mir-279a_NW_011929474.1_48557
AATGAGTGGCGGTCTAGTGCACGGTCGATAAAGTTGTGACTAGATCCACACTCATTAAG
Answered By - Andre Wildberg Answer Checked By - Katrina (WPSolving Volunteer)