Issue
I have following lines in a text file. I would like to remove last '_' and add a new line character after that.
>15_48499991_ENSG00000074803_C_G_G_CCAATCGCTTTCAAGTTAGTGTG
>15_48499991_ENSG00000074803_C_G_G_CAATCGCTTTCAAGTTAGTGTGA
>15_48499991_ENSG00000074803_C_G_G_AATCGCTTTCAAGTTAGTGTGAT
Desired output:
>15_48499991_ENSG00000074803_C_G_G
CCAATCGCTTTCAAGTTAGTGTG
>15_48499991_ENSG00000074803_C_G_G
CAATCGCTTTCAAGTTAGTGTGA
>15_48499991_ENSG00000074803_C_G_G
AATCGCTTTCAAGTTAGTGTGAT
I have used below SED query to perform this operation. I am not able to figure out the problem in my query.
sed 's/\_/'\n'/g'
Solution
You can have .*
eat as much of the line as it can (because *
is greedy) before matching _
,
sed 's/\(.*\)_/\1\n/' file
or the debatably nicer
sed -E 's/(.*)_/\1\n/' file
Concerning your attempt, it has 3 errors:
_
needs not be escaped'
cannot be nested (this is because of the shell, not ofsed
); fwiw, I don't understand why you've put them there: what were you trying to do?- if you fix the two above, ending up with
sed 's/_/\n/g'
, you would be substituting all_
s, rather than only the last one.
Answered By - Enlico Answer Checked By - Marie Seifert (WPSolving Admin)