Issue
I have an xml file in which I want to execute a sed
command to remove some strings,
I have an portion of the file here:
<?xml version="1.0" ?>
<DataPDU
xmlns:ns2="urn:swift:saa:xsd:saa.2.0">
<DbtrAcct>
<Id>
<Othr>
<Id>1234567890</Id>
</Othr>
</Id>
</DbtrAcct>
<CdtrAcct>
<Id>
<Othr>
<Id>1000002233250</Id>
</Othr>
</Id>
</CdtrAcct>
<Dt>
<Dt>2022-10-05</Dt>
</Dt>
</DataPDU>
from this file what I need to do is to remove the tags <Id>
and <Dt>
, but only when they have the same tag inside of them, when that happens, I need to remove one of the tags. to get a file that looks like this:
<?xml version="1.0" ?>
<DataPDU
xmlns:ns2="urn:swift:saa:xsd:saa.2.0">
<DbtrAcct>
<Othr>
<Id>1234567890</Id>
</Othr>
</DbtrAcct>
<CdtrAcct>
<Othr>
<Id>1000002233250</Id>
</Othr>
</CdtrAcct>
<Dt>2022-10-05</Dt>
</DataPDU>
(here a side by side to make it better to read:)
for this I was trying to do with a command like the following (I'll focus just on the <Id>
for now)
sed -i "s/<DbtrAcct>[^<>]*<Id>/<Id>/g" file.xml
with this I was trying to replace the string formed by <DbtrAcct>
+ <Id>
and just replace it with <Id>
but I'm having problems trying to look for these since they're not in the same line (as far as I know sed
only reads one line at the time).
how can I do to achieve what I'm needing here, I don't really know much about this type of manipulation, but I think this might work for what I need.
(the my second part would be how to escape the "/" in the closing tags for when I replace the closing tags of the file)
I'm also open to other options such as awk even with echos if it worth,
I've been trying to make the whole file a single line, then removing, and then making it xml formatted, but no luck
Solution
This might work for you (GNU sed):
sed -E '/^\s*<(Id|Dt)>/{:a;N;/^(\s*<)(\S+>).*\n\1\/\2/!ba;s/^\s*(<\S+>)[^\n]*\n(.*\1.*)\n.*/\2/}' file
If a line starts with <Id>
or <Dt>
, gather up the following line until its end tag at the same indentation.
If the collection contains another tag of the same type, remove the start and end lines of the collection.
Answered By - potong Answer Checked By - David Marino (WPSolving Volunteer)