Issue
I have a dataset that looks like the below:
col1,col2,col3,col4
read1,chr1,Unassigned_NoFeatures,
read2,chr2,Assigned,
read3,chr3,,Assigned
What I want to do is to perform egrep
on the "Assigned" and "Unassigned_NoFeatures" strings and then append them to the read names in the first column. The problem I face is that these strings are not always in the same column but I need to grep them and append them to the first column (per entry).
I tried the approach below to input egrep
results for each entry in my file and input it as a variable to awk but it did not work.
cat my_file.csv | variable=$(egrep -o "Assigned|Unassigned_NoFeatures") | awk -F, '{print $1"_"variable,$2}'
The desired output should be as:
read1_Unassigned_NoFeatures,chr1
read2_Assigned,chr2
read3_Assigned,chr3
How can I make the script work? Thanks
Solution
You never need grep
when you're using awk
. Given your input all you need to do is print the string from the 3rd or 4th field, whichever is not empty, e.g. using any awk
:
$ awk 'BEGIN{FS=OFS=","} NR>1{print $1 "_" ($3?$3:$4), $2}' my_file.csv
read1_Unassigned_NoFeatures,chr1
read2_Assigned,chr2
read3_Assigned,chr3
or even just, as @jhnc suggested in a comment:
awk 'BEGIN{FS=OFS=","} NR>1{print $1 "_" $3$4, $2}' my_file.csv
but if you actually did need to find strings that match a regexp across a whole line it'd be:
awk 'BEGIN{FS=OFS=","} match($0,/Assigned|Unassigned_NoFeatures/){print $1 "_" substr($0,RSTART,RLENGTH), $2}' my_file.csv
Answered By - Ed Morton Answer Checked By - Senaida (WPSolving Volunteer)