Issue
I am trying to use awk
to filter out data from a file. As of now, for a single value I am using the command like the below
hadoop fs -text file:///a/b/filename.snappy awk -F'|' '$11 == 655' > filter_20180705.txt
I want to pass a list of values in comparison section, instead of passing $11 == 655
, I would like to pass a list like $11
IN (list). Any thoughts on this will be helpful
Sample-data:
karthick,bangalore,software,it,bfsi,spark,hadoop,bigdata,etl,pentaho,655,dev
kumar,bangalore,software,it,bfsi,spark,hadoop,bigdata,etl,pentaho,611,dev
Raj,bangalore,software,it,bfsi,spark,hadoop,bigdata,etl,pentaho,800,dev
John,bangalore,software,it,bfsi,spark,hadoop,bigdata,etl,pentaho,823,dev
As of now I am getting the result as
karthick,bangalore,software,it,bfsi,spark,hadoop,bigdata,etl,pentaho,655,dev
I will need pass a list in the comparison filter. For example, I want to add 611
and 823
. So the expected result would be
karthick,bangalore,software,it,bfsi,spark,hadoop,bigdata,etl,pentaho,655,dev
kumar,bangalore,software,it,bfsi,spark,hadoop,bigdata,etl,pentaho,611,dev
John,bangalore,software,it,bfsi,spark,hadoop,bigdata,etl,pentaho,823,dev
Solution
Define a variable in the context of Awk
(using -v
) to include alternates for your regex match (separated by |
) and do your match as below. The ~
is your regex match operator in Awk
that allows you match $11
to the string defined (see How to Use Regular Expressions).
awk -F, -v list="655|611|823" '$11 ~ list' file
With the above approach your command translates to match $11
to any of the entries defined in the variable list
. To negate your regex match (i.e. opposite of what I want to match), use the negation !
operator as
awk -F, -v list="655|611|823" '$11 !~ list' file
To avoid false matches like 1182345
to be part of the match, make the regex more strict by including the start and end patterns
awk -F, -v list="655|611|823" 'BEGIN{list="^(" list ")$"} $11 ~ list' file
Answered By - Inian