Issue
I have the following file (which is a JUnit report file) from which I need to remove the system-out
and system-err
nodes and their content, while preserving the other node structures (elements and values).
My file has the following type of structure and content (please note a system-*
element can have multiline content and html like tags):
<testsuite name="someTest" tests="1" skipped="0" failures="0" errors="0">
<properties/>
<testcase name="someMethod" classname="classA" time="0.096">
<system-out><![CDATA[foo <li></li> bar]]></system-out>
<system-err><![CDATA[[one] INFO two
three four
five]]></system-err>
</testcase>
<system-out><![CDATA[]]></system-out>
<system-err><![CDATA[]]></system-err>
</testsuite>
The desired result is to have
<testsuite name="someTest" tests="1" skipped="0" failures="0" errors="0">
<properties/>
<testcase name="someMethod" classname="classA" time="0.096">
</testcase>
</testsuite>
I have tried multiple variants of sed patterns and the following is not nice but partially works. The current approach is to use tr
to replace new lines with some exotic character, then apply sed
on the one line text, then reuse tr
to include the previous new lines (I combined several SO suggestions to have it and I don't really know how to use the multiple sed -N
flag):
tr "\n" "\f" < "$f" |
sed 's/\(<system-err>\)\(.*\)\(<\/system-err>\)/\1\3/' |
sed 's/\(<system-out>\)\(.*\)\(<\/system-out>\)/\1\3/' |
tr "\f" "\n" > $(basename "$f")-out.xml
The problem with this is that the sed is greedy and for instance will remove from first system-err to last one, leaving unclosed elements.
I have tried multiple things, also to use a pattern as sed -E 's/<system-out><![(.*)]><\/system-out>//g'
, to match anything in between the system-*
text but it does not really work.
I am not a sed or regexp expert, so please be merciful :). My constraint is the need to use sed (inside a bash script).
Could someone please advise how to achieve the removal of the .
Thank you in advance!
Solution
With xmlstarlet:
xmlstarlet edit --omit-decl --delete '//system-out' --delete '//system-err' file.xml
Output:
<testsuite name="someTest" tests="1" skipped="0" failures="0" errors="0">
<properties/>
<testcase name="someMethod" classname="classA" time="0.096"/>
</testsuite>
See: xmlstarlet edit --help
Answered By - Cyrus Answer Checked By - Senaida (WPSolving Volunteer)