Issue

I have a data feed that I download on a regular bases into a csv. It looks like this

TABLE # 196712 / 9000_
>= 10   : 0.002
>= 5    : 0.001
>= 2    : 0.0005
>= 1    : 0.0002
>= 0.5  : 0.0001
>= 0.2  : 0.0001
>= 0.1  : 0.0001
>= 0.0001   : 0.0001
TABLE # 196714 / Dark
>= 0.0001   : 5e-05
TABLE # 196715 / GBD
>= 25   : 0.01
>= 10   : 0.005
>= 5    : 0.0025
>= 0.1  : 0.001
>= 0.0005   : 0.005

I would like to parse the file and categorize the data into a dictionary, where the number after the hash is a unique id (the new dict key) and the following rows (starting with >=) are volumes plus associated penalty values.

s.th like this would work:

{196712: [(10,0.002),(5,0.001),(2,0.0005),(1,0.0002),(0.5,0.0001),(0.2,0.0001),(0.1,0.0001),(0.0001, 0.0001)], 
 196714: [(0.0001,5e-05)], 
 196715: [(25,0.01),(10,0.005),(5,0.0025),(0.1,0.001),(0.0005,0.005)]}

What I would do to filter it outside python would be a grep and get the following lines, however the varying number of lines between IDs makes it more complex. Any other suggested more convenient data structure could be used as well.

Solution

Try:

s = """\
TABLE # 196712 / 9000_
>= 10   : 0.002
>= 5    : 0.001
>= 2    : 0.0005
>= 1    : 0.0002
>= 0.5  : 0.0001
>= 0.2  : 0.0001
>= 0.1  : 0.0001
>= 0.0001   : 0.0001
TABLE # 196714 / Dark
>= 0.0001   : 5e-05
TABLE # 196715 / GBD
>= 25   : 0.01
>= 10   : 0.005
>= 5    : 0.0025
>= 0.1  : 0.001
>= 0.0005   : 0.005"""

import re

out = {}
for table, data in re.findall(
    r"^TABLE # (\d+).*?\n(.*?)(?=^TABLE|\Z)", s, flags=re.M | re.S
):
    table = int(table)
    for a, b in re.findall(r"([\de.+-]+)\s*:\s*([\de.+-]+)", data):
        out.setdefault(table, []).append((float(a), float(b)))

print(out)

Prints:

{
    196712: [
        (10.0, 0.002),
        (5.0, 0.001),
        (2.0, 0.0005),
        (1.0, 0.0002),
        (0.5, 0.0001),
        (0.2, 0.0001),
        (0.1, 0.0001),
        (0.0001, 0.0001),
    ],
    196714: [(0.0001, 5e-05)],
    196715: [
        (25.0, 0.01),
        (10.0, 0.005),
        (5.0, 0.0025),
        (0.1, 0.001),
        (0.0005, 0.005),
    ],
}

Answered By - Andrej Kesely

Answer Checked By - Candace Johnson (WPSolving Volunteer)

Saturday, October 29, 2022

[SOLVED] Match pattern and put following lines into data structure

Issue

Solution