Issue
I am trying to use TextWrangler to take a bunch of text files, match everything within some angle-bracket tags (so far so good), and for every match, substitute all occurrences of a specific character with another.
For instance, I'd like to take something like
xx+xx <f>bar+bar+fo+bar+fe</f> yy+y <f>fee+bar</f> zz
match everything within <f>
and </f>
and then substitute all +'s with, say, *'s (but ONLY inside the "f" tag).
xx+xx <f>bar*bar*fo*bar*fe</f> yy+y <f>fee*bar</f> zz
I think I can easily match "f" tags containing +'s with an expression like
<f>[^<]*\+[^<]*</f>
but I have no idea on how to substitute only a subclass of character for each match. I don't know a priori how many +'s there are in each tag. I think I should run a regular expression for all matches of the first regular expression, but I am not really sure how to do that.
(In other words, I would like to match all +'s but only inside specific angle-bracket tags).
Does anyone have a hint?
Thanks a lot, Daniele
Solution
In case you're OK with an awk solution:
$ awk '{
while ( match($0,/<f>[^<]*\+[^<]*<\/f>/) ) {
tgt = substr($0,RSTART,RLENGTH)
gsub(/\+/,"*",tgt)
$0 = substr($0,1,RSTART-1) tgt substr($0,RSTART+RLENGTH)
}
print
}' file
xx+xx <f>bar*bar*fo*bar*fe</f> yy+y <f>fee*bar</f> zz
The above will work using any awk in any shell on any UNIX box. It relies on there being no <
within each <f>...</f>
as indicated by your sample code. If there can be then include that in your example and we can tweak the script to handle it:
$ awk '{
gsub("</f>",RS)
while ( match($0,/<f>[^\n]*\+[^\n]*\n/) ) {
tgt = substr($0,RSTART,RLENGTH)
gsub(/\+/,"*",tgt)
$0 = substr($0,1,RSTART-1) tgt substr($0,RSTART+RLENGTH)
}
gsub(RS,"</f>")
print
}' file
xx+xx <f>bar*bar*fo*bar*fe</f> yy+y <f>fee*bar</f> zz
Answered By - Ed Morton