Issue
I have multiple large files with two columns (tab-delimited). The content of these files are like this:
working_file-1
K00001 0.188
K00005 15.97
K00008 188.09
mapping file
K00001
K00002
K00003
K00004
K00005
K00006
K00007
K00008
mapping file range is K00001 - K25804
I want to map my working_file-1 to mapping file so that the output looks like this:
K00001 0.188
K00002
K00003
k00004
K00005 15.97
K00006
K00007
k00008 188.09
The blanks (K0's not present in working_file) can be filled with zero (if possible) or can be left as blank.
So far, I tried these codes by following other similar posts (but did not success):
awk 'NR==FNR {a[$1]++; next} $1 in a' mapping file working_file-1 > output.file
grep -Fw -f mapping file working_file-1 > output.file
edit: output of od -c work1; od -c map
0000000 K 0 0 0 0 1 \r \n K 0 0 0 0 2 \r \n
0000020 K 0 0 0 0 3 \r \n K 0 0 0 0 4 \r \n
0000040 K 0 0 0 0 5 \r \n K 0 0 0 0 6 \r \n
0000060 K 0 0 0 0 7 \r \n K 0 0 0 0 8 \r \n
Solution
Given a "map" file like:
a
b
c
d
e
f
g
h
and a "work1" file like:
a A
c C
g G
and desired "merged1" output like:
a A
b
c C
d
e
f
g G
h
then join
can do the combining:
join -1 1 -2 1 -a 1 -o 0,2.2 map work1 > merged1
-1 1 -2 1
joins on first (space-delimited) field of each file-a 1
prints lines from first file even if unpaired-o 0,2.2
formats output to be join field (first column), separator (space), then second field of second file
To produce instead "merged2" like:
a A
b 0
c C
d 0
e 0
f 0
g G
h 0
add the -e
option:
join -1 1 -2 1 -a 1 -e 0 -o 0,2.2 map work1 > merged2
If "work2" has fields separated by a character other than single space, use the -t
option.
For example, with "work2" using single tab delimiter like:
a A
c C
g G
(note: stackoverflow displays this with spaces instead of single tab) then with POSIX shell, use -t ' '
(ie. QUOTE TAB QUOTE - may need to be typed as: 'CTRL-VTAB' if shell performs history completion):
join -t ' ' -1 1 -2 1 -a 1 -e 0 -o 0,2.2 map work2 >merged3
or with bash, using -t $'\t'
is possible instead:
join -t $'\t' -1 1 -2 1 -a 1 -e 0 -o 0,2.2 map work2 >merged3
to produce "merged3" like:
a A
b 0
c C
d 0
e 0
f 0
g G
h 0
(note: again stackoverflow displays the tab as spaces)
Answered By - jhnc