Issue
I need to validate and clean a field in CSV. There is column for IP address and I need to remove only invalid data inside that column.
I tried the following command :
awk 'BEGIN{ FS=OFS="," }{ gsub(/^([0-9]{1,3}[\.]){3}[0-9]{1,3}$/,"", $3) }1' input.csv
Input file
anna,new york,192.168.1.5,usa
james,denver,240.210.1.8,usa
peter,denver,colarado,usa
tommy,new york,10.2.8.3 male,usa
Current output
anna,new york,,usa
james,denver,,usa
peter,denver,colarado,usa
tommy,new york,10.2.8.3 male,usa
Expected output
anna,new york,192.168.1.5,usa
james,denver,240.210.1.8,usa
peter,denver,,usa
tommy,new york,10.2.8.3,usa
This command remove the matching data, but I need the opposite. How do I remove only the non-matching data in the IP column ?
Solution
How do I remove only the non-matching data in the IP column ?
You might combine following string functions: match
substr
for this task following way
anna,new york,192.168.1.5,usa
james,denver,240.210.1.8,usa
peter,denver,colarado,usa
tommy,new york,10.2.8.3 male,usa
then
awk 'BEGIN{FS=OFS=","}{$3=match($3,/([0-9]{1,3}[\.]){3}[0-9]{1,3}/)?substr($3,RSTART,RLENGTH):"";print}' file.txt
gives output
anna,new york,192.168.1.5,usa
james,denver,240.210.1.8,usa
peter,denver,,usa
tommy,new york,10.2.8.3,usa
Explanation: I inform GNU AWK
that ,
is both field separator (FS
) and output field separator (OFS
), then for each line I use so called ternary operator condition?
valueiftrue:
valueiffalse, condition is if $3
does match regular expression, observe that I altered it slightly, so it does hold if IP is somewhere inside, rather than span whole column. If match found I use substr
to get substring which does correspond to match using RSTART
, RLENGTH
which were set by match, otherwise I use empty string. After that I print
whole line.
(tested in gawk 4.2.1)
Answered By - Daweo Answer Checked By - Dawn Plyler (WPSolving Volunteer)