Issue
I have a data feed that I download on a regular bases into a csv. It looks like this
TABLE # 196712 / 9000_
>= 10 : 0.002
>= 5 : 0.001
>= 2 : 0.0005
>= 1 : 0.0002
>= 0.5 : 0.0001
>= 0.2 : 0.0001
>= 0.1 : 0.0001
>= 0.0001 : 0.0001
TABLE # 196714 / Dark
>= 0.0001 : 5e-05
TABLE # 196715 / GBD
>= 25 : 0.01
>= 10 : 0.005
>= 5 : 0.0025
>= 0.1 : 0.001
>= 0.0005 : 0.005
I would like to parse the file and categorize the data into a dictionary, where the number after the hash is a unique id (the new dict key) and the following rows (starting with >=) are volumes plus associated penalty values.
s.th like this would work:
{196712: [(10,0.002),(5,0.001),(2,0.0005),(1,0.0002),(0.5,0.0001),(0.2,0.0001),(0.1,0.0001),(0.0001, 0.0001)],
196714: [(0.0001,5e-05)],
196715: [(25,0.01),(10,0.005),(5,0.0025),(0.1,0.001),(0.0005,0.005)]}
What I would do to filter it outside python would be a grep and get the following lines, however the varying number of lines between IDs makes it more complex. Any other suggested more convenient data structure could be used as well.
Solution
Try:
s = """\
TABLE # 196712 / 9000_
>= 10 : 0.002
>= 5 : 0.001
>= 2 : 0.0005
>= 1 : 0.0002
>= 0.5 : 0.0001
>= 0.2 : 0.0001
>= 0.1 : 0.0001
>= 0.0001 : 0.0001
TABLE # 196714 / Dark
>= 0.0001 : 5e-05
TABLE # 196715 / GBD
>= 25 : 0.01
>= 10 : 0.005
>= 5 : 0.0025
>= 0.1 : 0.001
>= 0.0005 : 0.005"""
import re
out = {}
for table, data in re.findall(
r"^TABLE # (\d+).*?\n(.*?)(?=^TABLE|\Z)", s, flags=re.M | re.S
):
table = int(table)
for a, b in re.findall(r"([\de.+-]+)\s*:\s*([\de.+-]+)", data):
out.setdefault(table, []).append((float(a), float(b)))
print(out)
Prints:
{
196712: [
(10.0, 0.002),
(5.0, 0.001),
(2.0, 0.0005),
(1.0, 0.0002),
(0.5, 0.0001),
(0.2, 0.0001),
(0.1, 0.0001),
(0.0001, 0.0001),
],
196714: [(0.0001, 5e-05)],
196715: [
(25.0, 0.01),
(10.0, 0.005),
(5.0, 0.0025),
(0.1, 0.001),
(0.0005, 0.005),
],
}
Answered By - Andrej Kesely Answer Checked By - Candace Johnson (WPSolving Volunteer)