Saturday, January 27, 2024

[SOLVED] sed regex blindness

Issue

I'm having some oddness with sed regex, and can't figure out what's going on. (This is "sed (GNU sed) 4.9" on OSX/MacPorts.) I've got a METAR data log file, and I'm wanting to get the cloud cover field:

2022/12/11 04:50 EGSH 110450Z AUTO 00000KT 0100 FZFG OVC001/// M04/M05 Q1009 
2022/12/11 05:20 EGSH 110520Z AUTO 00000KT 0100 FZFG OVC001/// M04/M05 Q1008 
2022/12/11 05:50 EGSH 110550Z 00000KT 0050 R27/0375N FZFG VV/// M05/M05 Q1008 NOSIG 
2022/12/11 06:20 EGSH 110620Z 00000KT 0050 R27/0375N FZFG VV/// M05/M05 Q1008 NOSIG

So I've got:

cat metar.log | sed -nE "s#(.*) EGSH .*(SKC|NCD|CLR|NSC|FEW|SCT|BKN|OVC|TCU|CB|VV)([^ ]*) .*#\1 \2\3#p"

which works as expected and gives what I want:

2022/12/11 04:50 OVC001///
2022/12/11 05:20 OVC001///
2022/12/11 05:50 VV///
2022/12/11 06:20 VV///

But if I combine the last two capture groups (note the removed brackets near the end):

cat metar.log | sed -nE "s#(.*) EGSH .*(SKC|NCD|CLR|NSC|FEW|SCT|BKN|OVC|TCU|CB|VV[^ ]*) .*#\1 \2#p"

Now the result:

2022/12/11 05:50 VV///
2022/12/11 06:20 VV///

Why doesn't this work the same?


Solution

By making it (...CB|VV[^ ]*) you changed your original regexp (...CB|VV)([^ ]*) to only allow extra non-blanks after VV, not after any of the other |-separated strings. Try this instead:

$ sed -nE 's#(.*) EGSH .*((SKC|NCD|CLR|NSC|FEW|SCT|BKN|OVC|TCU|CB|VV)[^ ]*) .*#\1 \2#p' metar.log
2022/12/11 04:50 OVC001///
2022/12/11 05:20 OVC001///
2022/12/11 05:50 VV///
2022/12/11 06:20 VV///

FWIW from the example you posted it looks like all you really need is:

$ cut -d' ' -f1,2,9 metar.log
2022/12/11 04:50 OVC001///
2022/12/11 05:20 OVC001///
2022/12/11 05:50 VV///
2022/12/11 06:20 VV///

or:

$ awk '{print $1, $2, $9}' metar.log
2022/12/11 04:50 OVC001///
2022/12/11 05:20 OVC001///
2022/12/11 05:50 VV///
2022/12/11 06:20 VV///

or if you really want to stick with sed for some reason then:

$ sed -E 's/(.{16}) ([^ ]+ ){6}([^ ]+).*/\1 \3/' metar.log
2022/12/11 04:50 OVC001///
2022/12/11 05:20 OVC001///
2022/12/11 05:50 VV///
2022/12/11 06:20 VV///


Answered By - Ed Morton
Answer Checked By - Candace Johnson (WPSolving Volunteer)