Issue
I have several strings(or filenames in a directory) and i need to group them by second most common pattern, then i will iterate over them by each group and process them. in the example below i need 2 from ACCEPT and 2 from BASIC_REGIS, bascially from string beginning to one character after hyphen (-) and it could be any character and not just digit. The first most common pattern are ACCEPT and BASIC_REGIS. I am looking for second most common pattern using grep -Po (Perl and only-matching). AWK solution is working
INPUT
ACCEPT-zABC-0123
ACCEPT-zBAC-0231
ACCEPT-1ABC-0120
ACCEPT-1CBA-0321
BASIC_REGIS-2ABC-9043
BASIC_REGIS-2CBA-8132
BASIC_REGIS-PCCA-6532
BASIC_REGIS-PBBC-3023
OUTPUT
ACCEPT-z
ACCEPT-1
BASIC_REGIS-2
BASIC_REGIS-P
echo "ACCEPT-0ABC-0123"|grep -Po "\K^A.*-"
Result : ACCEPT-0ABC-
but I need : ACCEPT-0
However awk solution is working
echo "ACCEPT-1ABC-0120"|awk '$0 ~ /^A/{print substr($0,1,index($0,"-")+1)}'
ACCEPT-1
Solution
You don't need -P
(PCRE) for that, just a plain, old BRE:
$ grep -o '^[^-]*-.' file | sort -u
ACCEPT-0
ACCEPT-1
BASIC_REGIS-2
BASIC_REGIS-9
Or using GNU awk alone:
$ awk 'match($0,/^[^-]*-./,a) && !seen[a[0]]++{print a[0]}' file
ACCEPT-0
ACCEPT-1
BASIC_REGIS-2
BASIC_REGIS-9
or any awk:
$ awk '!match($0,/^[^-]*-./){next} {$0=substr($0,1,RLENGTH)} !seen[$0]++' file
ACCEPT-0
ACCEPT-1
BASIC_REGIS-2
BASIC_REGIS-9
Answered By - Ed Morton Answer Checked By - Katrina (WPSolving Volunteer)