Issue
I have an xml file which has entries sorted by one category. I'd like to generate an output which prints this category in the same line as (some) info in lines found part of that category.
My typical input generated via grep -E "Kundschaft|continent" filename
looks like:
<landmarks continent="bagne" type="user">
<landmark type="3" x="595.21" y="-10981.47" title="~Ba-Nung Liangi (Kundschafter)"/>
<landmark type="3" x="943.54" y="-10365.21" title="Ba-Nung Liangi (Kundschafterin, Sapsammler)"/>
<landmarks continent="corrupted_moor" type="user"/>
<landmarks continent="fyros" type="user">
<landmark type="3" x="17106.73" y="-25706.46" title="Xymus Tindix (Kundschafter)"/>
<landmark type="3" x="17586.79" y="-25679.67" title="Apolus Abygrian (Kundschafter)"/>
<landmark type="3" x="17018.25" y="-25306.73" title="Ba'Reiliam Breggi (Kundschafter)"/>
From that I'd like to generate an output which quotes the continent followed by the title for each of the lines which contains a title element, e.g.:
bagne: ~Ba-Nung Liangi (Kundschafter)
bagne: Ba-Nung Liangi (Kundschafterin, Sapsammler)
fyros: Xymus tindix (Kundschafter)
fyros: Apolus Abygrian (Kundschafter)
fyros: Ba'Reiliam Breggi (Kundschafter)
I toyed around with awk a bit, but I don't properly manage to teach it to remember the continent and print it for each line which contains 'Kundschaft'.
I get somewhat a list which has the 'continent' as headings by using something like grep -E "Kundschaft|continent" filename | awk -F "=" '{print $2 $5}'
yet that is not yet nicely readable. Questions and the answers like here or here suggest to me that it should be possible to teach awk how to treat this as multiline repeating the 'continent' heading, but no dice for me so far. The man pages did not exactly help me along either.
Any pointers will be appreciated. If there is a non-awk solution (e.g. with sed) that shall be equally welcome.
Solution
Solution using python
and its' standard library, let file.txt
content be
<landmarks continent="bagne" type="user">
<landmark type="3" x="595.21" y="-10981.47" title="~Ba-Nung Liangi (Kundschafter)"/>
<landmark type="3" x="943.54" y="-10365.21" title="Ba-Nung Liangi (Kundschafterin, Sapsammler)"/>
<landmarks continent="corrupted_moor" type="user"/>
<landmarks continent="fyros" type="user">
<landmark type="3" x="17106.73" y="-25706.46" title="Xymus Tindix (Kundschafter)"/>
<landmark type="3" x="17586.79" y="-25679.67" title="Apolus Abygrian (Kundschafter)"/>
<landmark type="3" x="17018.25" y="-25306.73" title="Ba'Reiliam Breggi (Kundschafter)"/>
then create parse.py
file as follows
import xml.parsers.expat
common = {"continent": ""}
def start_element(name, attrs):
if name=="landmarks":
common["continent"] = attrs.get("continent", "")
if name=="landmark":
print(common["continent"] + ": " + attrs.get("title", ""))
p = xml.parsers.expat.ParserCreate()
p.StartElementHandler = start_element
with open("file.txt", "rb") as f:
try:
p.ParseFile(f)
except xml.parsers.expat.ExpatError:
pass
then
python parse.py
gives output
bagne: ~Ba-Nung Liangi (Kundschafter)
bagne: Ba-Nung Liangi (Kundschafterin, Sapsammler)
fyros: Xymus Tindix (Kundschafter)
fyros: Apolus Abygrian (Kundschafter)
fyros: Ba'Reiliam Breggi (Kundschafter)
(tested in Python 3.8.10)
Answered By - Daweo Answer Checked By - Willingham (WPSolving Volunteer)