Issue
Following is python implementation to extract specific fields from the following input file.
For the line starting with MEMRANGE
, I would like to extract the values corresponding to keys BASEVALUE
, INSTANCE
and SLAVEBUSINTERFACE
and print the result in the following format.
value[INSTANCE]
.value[SLAVEBUSINTERFACE]
\tab
=value[BASEVALUE]
Similarly, for the line starting with PORT
, I would like to extract the values corresponding to keys CLKFREQUENCY
and SIGNAME
and print the result as value[SIGNAME]
\tab
=value[CLKFREQUENCY]
input file
<MEMRANGE ADDRESSBLOCK="HP0_DDR_LOW" BASENAME="C_BASEADDR" BASEVALUE="0x00000000" HIGHNAME="C_HIGHADDR" HIGHVALUE="0x3FFFFFFF" INSTANCE="trimap_0" IS_DATA="TRUE" IS_INSTRUCTION="TRUE" MASTERBUSINTERFACE="M_AXI" MEMTYPE="MEMORY" SLAVEBUSINTERFACE="S_AXI_HP0_hmo"/>
<PORT CLKFREQUENCY="90005550" DIR="I" NAME="ACLK" SIGIS="clk" SIGNAME="SYS_clk_wiz_0_clk_out1">
<MEMRANGE ADDRESSBLOCK="HP0_DDR_LOW" BASENAME="C_BASEADDR" BASEVALUE="0x00000000" HIGHNAME="C_HIGHADDR" HIGHVALUE="0x3FFFFFFF" INSTANCE="trimap_0" IS_DATA="TRUE" IS_INSTRUCTION="TRUE" MASTERBUSINTERFACE="M_AXI_DRAM0" MEMTYPE="MEMORY" SLAVEBUSINTERFACE="S_AXI_HP0_hmo"/>
<PORT CLKFREQUENCY="90005550" DIR="I" NAME="ACLK" SIGIS="clk" SIGNAME="SYS_clk_wiz_0_clk_out1">
I have tested the individual functions and they work fine. When I try to write the output of the functions to the output file, only the first function is executed and the second function doesn't execute.
import re
def populate_ip_address(input_file, output_file):
print(" > writing ip address")
for line in input_file:
match_address = re.match(r'.*MEMRANGE .*BASEVALUE="(\w+)\".* INSTANCE="(\w+)\".* SLAVEBUSINTERFACE="(\w+)\"',line)
if match_address:
newline1= "\n%s.%s\t\t\t\t\t= %s" % (match_address.group(2), match_address.group(3), match_address.group(1))
output_file.write(newline1)
def populate_clock_frequency(input_file, output_file):
print(" > writing clock_frequency")
for line in input_file:
match_clock = re.match(r'.*PORT .*CLKFREQUENCY="(\w+)\".* SIGNAME="(\w+)\"',line)
if match_clock:
print(" >> writing clock_frequency")
newline2= "\n%s \t= %s" % (match_clock.group(2), match_clock.group(1))
output_file.write(newline2)
input_file = open("test.txt", "r")
with open('new.txt','w') as output_file:
populate_clock_frequency(input_file, output_file)
populate_ip_address(input_file, output_file)
expected
SYS_clk_wiz_0_clk_out1 = 90005550
SYS_clk_wiz_0_clk_out1 = 90005550
trimap_0.S_AXI_HP0_hmo = 0x00000000
trimap_0.S_AXI_HP0_hmo = 0x00000000
current output
SYS_clk_wiz_0_clk_out1 = 90005550
SYS_clk_wiz_0_clk_out1 = 90005550
Am I doing something wrong in the file access? I am also open to answers doing the above parsing in sed, awk or grep.
Solution
Assumptions:
- lines of interest (1st field is
<MEMRANGE
or<PORT
) reside on a single line (ie, they do not span multiple lines)
General approach:
- split (space delimited) fields on equal sign (
=
) - if 1st sub-field is one of our 5 desired attributes then ...
- split the 2nd sub-field on double quotes (
"
) and ... - store the attribute/value pair in an array
- print the contents of the array
One awk
idea:
awk '
BEGIN { n=split("INSTANCE|SLAVEBUSINTERFACE|BASEVALUE|SIGNAME|CLKFREQUENCY",a,"|")
for (i=1;i<=n;i++) # build associative array where
attrib_list[a[i]] # indices are the desired attributes
}
/<MEMRANGE |<PORT / { delete val
for (i=2;i<=NF;i++) { # loop through space-delimited fields
split($i,a,"=") # split <attrib>="<value>" on equal sign
if (a[1] in attrib_list) { # if this is one of the desired attributes then ...
split(a[2],b,"\"") # split "<value>" on double quote and ...
val[a[1]]=b[2] # save: val[<attrib>]=<value>
}
}
if ($1=="<MEMRANGE")
printf "%s.%s\t= %s\n", val["INSTANCE"], val["SLAVEBUSINTERFACE"], val["BASEVALUE"]
else
printf "%s\t= %s\n", val["SIGNAME"], val["CLKFREQUENCY"]
}
' input_file
This generates:
trimap_0.S_AXI_HP0_hmo = 0x00000000
SYS_clk_wiz_0_clk_out1 = 90005550
trimap_0.S_AXI_HP0_hmo = 0x00000000
SYS_clk_wiz_0_clk_out1 = 90005550
If order is important OP can pipe the results to sort
, eg:
$ awk '... awk script ...' input_file | sort
SYS_clk_wiz_0_clk_out1 = 90005550
SYS_clk_wiz_0_clk_out1 = 90005550
trimap_0.S_AXI_HP0_hmo = 0x00000000
trimap_0.S_AXI_HP0_hmo = 0x00000000
NOTES:
- output is based on the single
\t
delimiter as mentioned in the textual description - OP's expected output appears to show 2 extra tabs (
\t
) for thetrimap*
lines while OP's current code appears to show 5x tabs (\t
) - OP can modify the
printf
format to generate the desired number of tabs (\t
)
Answered By - markp-fuso Answer Checked By - Mary Flores (WPSolving Volunteer)