Issue
Let's say I have two arrays (A and B)
in directory A
#!/bin/bash
A=( $(ls *txt) )
fox_abdce.txt
rabbit_abdce.txt
lemom_asnrndna.txt
in directory B
#!/bin/bash
B=( $(ls *txt) )
directory B contains:
fox_zzzzzz.txt
rabbit_zzzedd.txt
lemom_kokoijijim.txt
Or, input with type set (this could be generalized to anything similar)
#!/bin/bash
declare -a A=([0]="fox_abcde.txt" [1]="lemom_asnrndna.txt" [2]="rabbit_abcde.txt")
declare -a B=([0]="fox_zzzzzz.txt" [1]="lemom_kokoijijim.txt" [2]="rabbit_zzzedd.txt")
I want to compare them to find out if all of them are similar by the first 3 letters
I would use AWK like this to find out if two columns from a csv file have the same initial three letters:
#!/bin/bash
export NUMBER_OF_DIGITS=3
matching
awk -F, '{if(substr($1, 1, $NUMBER_OF_DIGITS) == substr($2, 1, $NUMBER_OF_DIGITS)) print}' file.csv
Not matching
awk -F, '{if(substr($1, 1, $NUMBER_OF_DIGITS) != substr($2, 1, $NUMBER_OF_DIGITS)) print}' file.csv
How could I apply the same interrogation but using the arrays directly?
In this case the output should be anything with everything that matches
fox_abdce.txt
rabbit_abdce.txt
lemom_asnrndna.txt
fox_zzzzzz.txt
rabbit_zzzedd.txt
lemom_kokoijijim.txt
OR
fox_abdce.txt fox_zzzzzz.txt
rabbit_abdce.txt rabbit_zzzedd.txt
lemom_asnrndna.txt lemom_kokoijijim.txt
Solution
Assumptions:
- file names do not include embedded linefeeds
- both arrays have the same number of entries
- we're to compare array entries that have the same array index
Adding a 'not matching' data point:
A=("fox_abdce.txt" "rabbit_abdce.txt" "ignore_me" "lemom_asnrndna.txt")
B=("fox_zzzzzz.txt" "rabbit_zzzedd.txt" "not_me" "lemom_kokoijijim.txt")
Fixing the NUMBER_OF_DIGITS
issue:
#### replace this:
NUMBER_OF_DIGITS=(3)
#### with this:
NUMBER_OF_DIGITS=3
#### then feed to awk via a -v flag/arg, eg:
awk -v awk_var_name="OS_var_value"
One awk
idea using process substitution:
echo "########## matching"
awk -v len="${NUMBER_OF_DIGITS}" '
FNR==NR { a[FNR]=$0; next }
substr(a[FNR],1,len) == substr($0,1,len) { print a[FNR],$0 }
' <(printf "%s\n" "${A[@]}") <(printf "%s\n" "${B[@]}")
echo "########## not matching"
awk -v len="${NUMBER_OF_DIGITS}" '
FNR==NR { a[FNR]=$0; next }
substr(a[FNR],1,len) != substr($0,1,len) { print a[FNR],$0 }
' <(printf "%s\n" "${A[@]}") <(printf "%s\n" "${B[@]}")
This generates:
########## matching
fox_abdce.txt fox_zzzzzz.txt
rabbit_abdce.txt rabbit_zzzedd.txt
lemom_asnrndna.txt lemom_kokoijijim.txt
########## not matching
ignore_me not_me
Assumptions:
- file names do not include embedded commas (otherwise we will need to choose a different delimiter for the
paste
command)
A different approach using paste
to join the two sets of process substitution:
$ paste -d, <(printf "%s\n" "${A[@]}") <(printf "%s\n" "${B[@]}")
fox_abdce.txt,fox_zzzzzz.txt
rabbit_abdce.txt,rabbit_zzzedd.txt
ignore_me,not_me
lemom_asnrndna.txt,lemom_kokoijijim.txt
Feeding the paste
output to awk
:
echo "########## matching"
awk -F, -v len="${NUMBER_OF_DIGITS}" '
substr($1,1,len) == substr($2,1,len)
' <(paste -d, <(printf "%s\n" "${A[@]}") <(printf "%s\n" "${B[@]}"))
echo "########## not matching"
awk -F, -v len="${NUMBER_OF_DIGITS}" '
substr($1,1,len) != substr($2,1,len)
' <(paste -d, <(printf "%s\n" "${A[@]}") <(printf "%s\n" "${B[@]}"))
This generates:
########## matching
fox_abdce.txt,fox_zzzzzz.txt
rabbit_abdce.txt,rabbit_zzzedd.txt
lemom_asnrndna.txt,lemom_kokoijijim.txt
########## not matching
ignore_me,not_me
Answered By - markp-fuso Answer Checked By - Terry (WPSolving Volunteer)