Thursday, February 3, 2022

[SOLVED] How to use grep and sed simultaneously using pipe

Issue

I have 2 files

File 1

TRINITY_DN10039_c1_g1_i1        216     Brassica rapa   
TRINITY_DN10270_c0_g1_i1        233     Pan paniscus  
TRINITY_DN10323_c0_g1_i2        209     Corynebacterium aurimucosum ATCC 700975  
.  
.   
TRINITY_DN10462_c0_g1_i1        257     Helwingia himalaica    
TRINITY_DN10596_c0_g1_i1        205     Homo sapiens   
TRINITY_DN10673_c0_g2_i2        323     Anaerococcus prevotii DSM 20548

File 2

TRINITY_DN9856_c0_g1_i1 len=467 path=[0:0-466]
GATGCGGGCCAATATGAATGTGAGATTACTAATGAATTGGGGACTAAAAA
TRINITY_DN9842_c0_g1_i1 len=208 path=[0:0-207]
AAGTAATTTTATATCACTTGTTACATCGCAATTCGTGAGTTAAACTTAAT
.
.
TRINITY_DN9897_c0_g1_i1 len=407 path=[0:0-406]
AACTTTATTAACTTGTTGTACATATTTATTAATGCAAATACATATAGAG  
TRINITY_DN9803_c0_g1_i1 len=795 path=[0:0-794]
AACTAAGACAAACTTCGCGGAGCAGTTAGAAAATATTACAAGAGATTTG

I want to delete 2 lines(same line and next line) in file2 whose pattern matches with the first column words of 1st file

awk '{print $1}' file1 | sed '/here_i_want_to_insert_output_of_pipe/{N;d;}' file2


Solution

If the field has no special characters in the first field, like . or / or [ or ( or \ or any regex-special characters, your idea is actually not that bad:

sed "$(cut -d' ' -f1 file1 | sed 's@.*@/&/{N;d}@')" file2
  • cut -d' ' -f1 file1 - extract first field from file1
  • | sed
    • .* - replace anything. ie. the first field from file1
    • /&/{N;d} - the & is substituted for the whole thing we are replacing. So for the first field. So it becomes /<first field>/{N;d}
  • then wrap it around sed "<here>" file2

No so much known feature, you can use another character for /regex/ with syntax \<char>regex<char> like \!regex!. Below I use ~:

 sed "$(cut -d' ' -f1 file1 | sed 's@.*@\\~&~{N;d}@')" file2

If you however do have any special characters on the first field, then if you don't care about sorting: You can replace two lines in file2 for a single line with some magic separator (I chose ! below), then sort it and sort file1, and then just join them. The -v2 makes join output unpairable lines from second file - ie. not matched lines. After that restore the newline, by replacing the magic separator ! for a newline:

join -v2 <(cut -d' ' -f1 file1 | sort) <(sed 'N;s/\n/!/' file2 | sort -k1) |
tr '!' '\n'

If the output needs to be sorted as in file2, you can number lines in file2 and re-sort the output on line numbers:

join -11 -22 -v2 <(cut -d' ' -f1 file1 | sort) <(sed 'N;s/\n/!/' file2 | nl -w1 | sort -k2) |
sort -k2 | cut -d' ' -f1,3- | tr '!' '\n'

Tested on repl



Answered By - KamilCuk
Answer Checked By - David Goodson (WPSolving Volunteer)