Issue
I have a data file of 1000 to 2000 columns and more than 3000 rows.
Input example Data
GO:0009987 Os760 Os840 Os550 Os380 Os590 Os340
GO:0043170 Os610 Os043 Os035
Expected Output:
GO:0009987 Os760
GO:0009987 Os840
GO:0009987 Os550
GO:0009987 Os380
GO:0009987 Os590
GO:0009987 Os340
GO:0043170 Os610
GO:0043170 Os043
GO:0043170 Os035
I tried this:
sed 's/ /\n/2; P; D' filename | awk 'NF==2 {a =$1;b=$2; print; next} {print a,$0}'
But this give me result like this. (with one extra GO value in column 1)
. I want to remove this extra GO from the file.
GO:0009987 Os760
GO:0009987 Os840
GO:0009987 Os550
GO:0009987 Os380
GO:0009987 Os590
GO:0009987 Os340
GO:0009987
GO:0043170 Os610
GO:0043170 Os043
GO:0043170 Os035
GO:0043170
Solution
Could you please try following(changed delimited selection as per Sundeep sir's comments).
awk '{for(i=2;i<=NF;i++){print $1,$i}}' Input_file
OR try:
awk 'BEGIN{FS=":| +"} {for(i=3;i<=NF;i++){print $1":"$2,$i}}' Input_file
OR:
awk -F':| +' '{for(i=3;i<=NF;i++){print $1":"$2,$i}}' Input_file
Answered By - RavinderSingh13