Issue
I am trying to keep only the first field identifier for each sequence in a .fasta file that looks like this:
>hetGla3 ENST00000215754.179
ATGCCGATGTTCGTCTTGAACACCAACGTGCCCCGCGCCTCTGTGCCGGACGGGTTCCTCTCCGAGCTCACCCAGCAGCTGGCGCAGGCCACTGGCAAGCCGGCCCAGTATATCGCAGTGCACGTGGTCCCGGACCAGCTCATGACCTTCGCGGGCTCATCCGAGCCCTGCGCGCTCTGCAGCCTGCACAGCATCGGCAAGATAGGCGGCGTTCAGAATCGCTCGTACAGCAAGCTGCTGTGTGGCCTGCTGGCGGAGCGCCTGCGTATCAGTCCGGACAGGATCTACATCAACTACTACGACATGAATGCGGCCAATGTGGGCTGGAACGGCTCCACCTTCGCTNNN
>musMus10 ENST00000215754.270
ATGCCTATGTTCATCGTGAACACCAATGTTCCCCGCGCCTCCGTGCCAGAGGGGTTTCTGTCGGAGCTCACCCAGCAGCTGGCGCAGGCCACCGGCAAGCCCGCACAGTACATCGCAGTGCACGTGGTCCCGGACCAGCTCATGACTTTTAGCGGCACGAACGATCCCTGCGCCCTCTGCAGCCTGCACAGCATCGGCAAGATCGGTGGTGCCCAGAACCGCAACTACAGTAAGCTGCTGTGTGGCCTGCTGTCCGATCGCCTGCACATCAGCCCGGACCGGGTCTACATCAACTATTACGACATGAACGCTGCCAACGTGGGCTGGAACGGTTCCACCTTCGCTNNN
I want to remove the \tab and "ENST..." identifier after it, returning:
>hetGla3
ATGCCGATGTTCGTCTTGAACACCAACGTGCCCCGCGCCTCTGTGCCGGACGGGTTCCTCTCCGAGCTCACCCAGCAGCTGGCGCAGGCCACTGGCAAGCCGGCCCAGTATATCGCAGTGCACGTGGTCCCGGACCAGCTCATGACCTTCGCGGGCTCATCCGAGCCCTGCGCGCTCTGCAGCCTGCACAGCATCGGCAAGATAGGCGGCGTTCAGAATCGCTCGTACAGCAAGCTGCTGTGTGGCCTGCTGGCGGAGCGCCTGCGTATCAGTCCGGACAGGATCTACATCAACTACTACGACATGAATGCGGCCAATGTGGGCTGGAACGGCTCCACCTTCGCTNNN
>musMus10
ATGCCTATGTTCATCGTGAACACCAATGTTCCCCGCGCCTCCGTGCCAGAGGGGTTTCTGTCGGAGCTCACCCAGCAGCTGGCGCAGGCCACCGGCAAGCCCGCACAGTACATCGCAGTGCACGTGGTCCCGGACCAGCTCATGACTTTTAGCGGCACGAACGATCCCTGCGCCCTCTGCAGCCTGCACAGCATCGGCAAGATCGGTGGTGCCCAGAACCGCAACTACAGTAAGCTGCTGTGTGGCCTGCTGTCCGATCGCCTGCACATCAGCCCGGACCGGGTCTACATCAACTATTACGACATGAACGCTGCCAACGTGGGCTGGAACGGTTCCACCTTCGCTNNN
I have already tried sed to remove all whitespaces from headers, but it doesn't appear to be working (returns the original format):
sed 's/\.[^\.]*//'
Any help would be greatly appreciated! Thank you.
Solution
Using GNU sed
$ sed -E '/^>/s/( +|\t).*//' input_file
>hetGla3
ATGCCGATGTTCGTCTTGAACACCAACGTGCCCCGCGCCTCTGTGCCGGACGGGTTCCTCTCCGAGCTCACCCAGCAGCTGGCGCAGGCCACTGGCAAGCCGGCCCAGTATATCGCAGTGCACGTGGTCCCGGACCAGCTCATGACCTTCGCGGGCTCATCCGAGCCCTGCGCGCTCTGCAGCCTGCACAGCATCGGCAAGATAGGCGGCGTTCAGAATCGCTCGTACAGCAAGCTGCTGTGTGGCCTGCTGGCGGAGCGCCTGCGTATCAGTCCGGACAGGATCTACATCAACTACTACGACATGAATGCGGCCAATGTGGGCTGGAACGGCTCCACCTTCGCTNNN
>musMus10
ATGCCTATGTTCATCGTGAACACCAATGTTCCCCGCGCCTCCGTGCCAGAGGGGTTTCTGTCGGAGCTCACCCAGCAGCTGGCGCAGGCCACCGGCAAGCCCGCACAGTACATCGCAGTGCACGTGGTCCCGGACCAGCTCATGACTTTTAGCGGCACGAACGATCCCTGCGCCCTCTGCAGCCTGCACAGCATCGGCAAGATCGGTGGTGCCCAGAACCGCAACTACAGTAAGCTGCTGTGTGGCCTGCTGTCCGATCGCCTGCACATCAGCCCGGACCGGGTCTACATCAACTATTACGACATGAACGCTGCCAACGTGGGCTGGAACGGTTCCACCTTCGCTNNN
Answered By - HatLess Answer Checked By - Willingham (WPSolving Volunteer)