Issue
I have this kind of text, where I want to hide only values of some fields: l1
and x2
. Here is example:
{
"info":
{
"l1": 77,
"x2": 77,
},
"user": "2323",
"id": "xxxx",
"time": 1679955931845,
"msgType": "oyui"
}
I have come up with perfect regex which is working fine as "regex": (?<=(l1|x2)":)(.*?)(?=,)
But now I want to use it in Linux with sed
, which seems to be way too complex. At the end of day I made it work in two sed
statements, but now I cannot find place for myself, because of not knowing how it can be done within one regex with `sed.
UPDATE
There are good answers if somebody would stop upon such issue. However, in my case, I specifically need to use sed
statement, because this is the input required for configuration in other services (in my case Splunk and Field Filtering option with sed
https://docs.splunk.com/Documentation/Splunk/9.0.4/Security/setfieldfiltering)
Solution
sed
does not support the dialect you are trying to use. But Perl does.
perl -ne 'if (m/(?<=(l1|x2)":)(.*?)(?=,)/) { print "$1: $2\n" }'
Splunk basically borrows its regex engine from Perl (or PCRE?) so it should be convenient and natural to go back and forth between Perl and Splunk (though I should think you would never want to go back if you manage to leave ...)
Perl has some superficial similarities with sed
, so you can say things like
perl -pe 's%(?<=foo)bar(?=baz)%quux%g'
which should be reasonably transparent if you are familiar with sed
. There's even a tool s2p
which automatically translates sed
scripts to Perl scripts.
Parenthetically, many Splunk patterns seem to use named groups; you can use the built-in hash %+
in Perl to access these. 1
perl -ne 'if (m/(?<=(?P<thing>l1|x2)":)(?P<value>.*?)(?=,)/) { print "$+{thing}: $+{value}\n" }'
Perhaps see also Why are there so many different regular expression dialects?
If you genuinely need to use sed
specifically, you need to refactor your regular expression to a BRE or at least an ERE - the latter is feasible if your sed
has a (non-standard, but common) -r
or -E
option;
sed -nE 's/.*(l1|x2)":([^,]*),.*/"\1": "\2"/p'
This isn't exactly equivalent, obviously; the lookarounds have no real equivalent in traditional regex, so I just converted them to regular matches; and [^,]*
isn't at all the same as .*?
but in this case I'm guessing it's what you actually mean. Without seeing your actual data, it's hard to tell, but I can't imagine a scenario where the non-greedy regex would do something different. (More generally, [^,]*
cannot match a comma, whereas .*?
before a comma could still match a comma if that will allow the overall regex to reach a match.)
Without more information about what exactly you are hoping the parenthesized groups should do, this can obviously only be just a hint for how to actually solve your problem.
The corresponding POSIX BRE regex would have backslashes before each (
, |
, or )
.
1
The hash is named %+
but an individual hash value is accessed like $+{"key"}
. The mnemonic is that %
is a sigil for the entire hash and $
is the sigil for a scalar such as an individual value out of the hash.
Many people are critical of Perl's "arcane" syntax but they clearly haven't seen Splunk's.
Answered By - tripleee Answer Checked By - Clifford M. (WPSolving Volunteer)