Issue
I have 2 files
File 1
TRINITY_DN10039_c1_g1_i1 216 Brassica rapa
TRINITY_DN10270_c0_g1_i1 233 Pan paniscus
TRINITY_DN10323_c0_g1_i2 209 Corynebacterium aurimucosum ATCC 700975
.
.
TRINITY_DN10462_c0_g1_i1 257 Helwingia himalaica
TRINITY_DN10596_c0_g1_i1 205 Homo sapiens
TRINITY_DN10673_c0_g2_i2 323 Anaerococcus prevotii DSM 20548
File 2
TRINITY_DN9856_c0_g1_i1 len=467 path=[0:0-466]
GATGCGGGCCAATATGAATGTGAGATTACTAATGAATTGGGGACTAAAAA
TRINITY_DN9842_c0_g1_i1 len=208 path=[0:0-207]
AAGTAATTTTATATCACTTGTTACATCGCAATTCGTGAGTTAAACTTAAT
.
.
TRINITY_DN9897_c0_g1_i1 len=407 path=[0:0-406]
AACTTTATTAACTTGTTGTACATATTTATTAATGCAAATACATATAGAG
TRINITY_DN9803_c0_g1_i1 len=795 path=[0:0-794]
AACTAAGACAAACTTCGCGGAGCAGTTAGAAAATATTACAAGAGATTTG
I want to delete 2 lines(same line and next line) in file2 whose pattern matches with the first column words of 1st file
awk '{print $1}' file1 | sed '/here_i_want_to_insert_output_of_pipe/{N;d;}' file2
Solution
If the field has no special characters in the first field, like .
or /
or [
or (
or \
or any regex-special characters, your idea is actually not that bad:
sed "$(cut -d' ' -f1 file1 | sed 's@.*@/&/{N;d}@')" file2
cut -d' ' -f1 file1
- extract first field from file1| sed
.*
- replace anything. ie. the first field from file1/&/{N;d}
- the&
is substituted for the whole thing we are replacing. So for the first field. So it becomes/<first field>/{N;d}
- then wrap it around
sed "<here>" file2
No so much known feature, you can use another character for /regex/
with syntax \<char>regex<char>
like \!regex!
. Below I use ~
:
sed "$(cut -d' ' -f1 file1 | sed 's@.*@\\~&~{N;d}@')" file2
If you however do have any special characters on the first field, then if you don't care about sorting: You can replace two lines in file2 for a single line with some magic separator (I chose !
below), then sort it and sort file1, and then just join
them. The -v2
makes join
output unpairable lines from second file - ie. not matched lines. After that restore the newline, by replacing the magic separator !
for a newline:
join -v2 <(cut -d' ' -f1 file1 | sort) <(sed 'N;s/\n/!/' file2 | sort -k1) |
tr '!' '\n'
If the output needs to be sorted as in file2, you can number lines in file2 and re-sort the output on line numbers:
join -11 -22 -v2 <(cut -d' ' -f1 file1 | sort) <(sed 'N;s/\n/!/' file2 | nl -w1 | sort -k2) |
sort -k2 | cut -d' ' -f1,3- | tr '!' '\n'
Answered By - KamilCuk Answer Checked By - David Goodson (WPSolving Volunteer)