Issue
I have data organized as lines (no columns). Lines altern between ">name" and "data", such that:
>name1
textA
>name2
textB
>name3
textC
I want to remove lines with a given name and the associated data - e.g., remove the data for >name3, meaning that both the line >name3 and the textC line should be removed.
I am using:
awk 'BEGIN {RS = ">"; ORS = ""} !/name3/ {print">"; print $0}' FILE
However, the output is as following:
>>name1
textA
>name2
textB
I have tried several alternatives but I did not manage to get the first line right (e.g., either the ">" is doubled or completely missing).
Solution
The record separator is >
.
The first character of the file is >
.
Thus, the first record of the file is the empty string before the first record separator.
You want:
awk '
BEGIN {RS = ">"; ORS = ""}
length && $1 != "name3" {print RS $0}
' file
>name1
textA
>name2
textB
An alternate way to solve the problem:
paste - - < file | grep -v '>name3[[:blank:]]' | tr '\t' '\n'
Answered By - glenn jackman Answer Checked By - Senaida (WPSolving Volunteer)