Wednesday, December 15, 2021

[SOLVED] Find pattern in file and print every xth occurrence to another file

December 15, 2021 awk, file, file-io, sed

Issue

I have a very large file that is similar to the snippet below. The snippet shows three blocks of data. They are from three distinct time steps (i).

6 # <--This is the same for all data blocks (i.e., always 6 rows of data)
i =        0, time =        0.000, k =      9000000000000
X        -7.6415350292        6.0494971539        8.1919697993
Y        -6.6418362233        5.9231018862        8.4056822626
Y        -8.0518670684        6.3158684817        9.0061271154
X        26.8252967820       20.4661074967       17.8025744066
Y        26.4477411207       20.4071029058       16.9121571912
Y        26.4399648474       21.2950722068       18.1009273227
6
i =        1, time =        0.500, k =      2500000000000
X        -6.2423192714       -1.5704681396       -9.5648670474
Y        -5.4925100813       -1.6522059045       -8.9030589772
Y        -6.7765278574       -2.3616512405       -9.4776648590
X         4.1248924594       27.8487302083      -17.5400886312
Y         4.1238657681       26.9869907778      -17.9727402579
Y         5.0750649402       28.1292768156      -17.6848507559
6
i =        2, time =        1.000, k =      3945000000000
X        19.0090162215       -5.9338939011        6.1931167954
Y        18.4748060757       -6.4905073540        5.6656446036
Y        19.2825591449       -6.4479943255        7.0179774953
X        11.0203415273       34.6029396705        2.7220660957
Y        11.1184002007       34.8398120338        1.8089008500
Y        10.3349649622       33.9509485292        2.5605794622

I would like to print every 100th data block to a new file.

The answer from @potong at the link below looks promising (if I understand, the other answers depend on a blank line between the data blocks and I don't have one). I have managed to use it to print every distinct block to its own file. But I end up with too many files. If anyone knows how to adapt potong's method so that it only works on every xth block, I would be very grateful for a hint.

Find specific pattern and print complete text block using awk or sed

If I do this, I need to make a similar modification to a second (corresponding) file that looks like this:

0       0.000       13.6600000000        0.0000000000        0.0000000000        0.0000000000       13.6600000000        0.0000000000        0.0000000000        0.0000000000       13.6600000000          2548

This is the first row. The first two columns correspond to i = 0, time = 0.000 in the first data block above. I need to print this row and every xth row to a new file so that I have two files with data from the same time steps.

I can think of ways put every xth row in a new file, but if there's a way to make sure the first two columns match i = 0, time = 0.000 above in (1), that would be great to know (so that I don't end up with a mismatch if a row has failed to print or a time step has been repeated in the file).

I have added an "Awk" tag because it seems like this is something Awk might be able to do, but I am not experienced in Awk.

Solution

NOTE: only addressing OP's first requirement of printing every 100th block to a separate/new file ...

Assumptions:

each block consists of 8 lines (the standalone 6, the i = ... line, and 6 data lines)
each 8-line block of interest to be dumped to a separate output file
output file name format: block.<block_count>.dat (OP can change per requirement)

Sample data:

$ cat  block.dat
6 # block #1
i =        0, time =        0.000, k =      9000000000000
X        -7.6415350292        6.0494971539        8.1919697993
Y        -6.6418362233        5.9231018862        8.4056822626
Y        -8.0518670684        6.3158684817        9.0061271154
X        26.8252967820       20.4661074967       17.8025744066
Y        26.4477411207       20.4071029058       16.9121571912
Y        26.4399648474       21.2950722068       18.1009273227
6 # block #2
i =        1, time =        0.500, k =      2500000000000
X        -6.2423192714       -1.5704681396       -9.5648670474
Y        -5.4925100813       -1.6522059045       -8.9030589772
Y        -6.7765278574       -2.3616512405       -9.4776648590
X         4.1248924594       27.8487302083      -17.5400886312
Y         4.1238657681       26.9869907778      -17.9727402579
Y         5.0750649402       28.1292768156      -17.6848507559
6 # block #3
i =        2, time =        1.000, k =      3945000000000
X        19.0090162215       -5.9338939011        6.1931167954
Y        18.4748060757       -6.4905073540        5.6656446036
Y        19.2825591449       -6.4479943255        7.0179774953
X        11.0203415273       34.6029396705        2.7220660957
Y        11.1184002007       34.8398120338        1.8089008500
Y        10.3349649622       33.9509485292        2.5605794622
6 # block #4
i =        2, time =        1.000, k =      3945000000000
X        19.0090162215       -5.9338939011        6.1931167954
Y        18.4748060757       -6.4905073540        5.6656446036
Y        19.2825591449       -6.4479943255        7.0179774953
X        11.0203415273       34.6029396705        2.7220660957
Y        11.1184002007       34.8398120338        1.8089008500
Y        10.3349649622       33.9509485292        2.5605794622

One awk idea to print out every xth block

x=2                         # set to 100 per OP's requirement

awk -v x="${x}" '
$1 == "6"    { count++ }
!(count % x) { print > "block." count ".dat"}
' block.dat

This generates:

for f in block.*.dat
do
    echo "########### $f"
    cat $f
done

########### block.2.dat
6 # block #2
i =        1, time =        0.500, k =      2500000000000
X        -6.2423192714       -1.5704681396       -9.5648670474
Y        -5.4925100813       -1.6522059045       -8.9030589772
Y        -6.7765278574       -2.3616512405       -9.4776648590
X         4.1248924594       27.8487302083      -17.5400886312
Y         4.1238657681       26.9869907778      -17.9727402579
Y         5.0750649402       28.1292768156      -17.6848507559
########### block.4.dat
6 # block #4
i =        2, time =        1.000, k =      3945000000000
X        19.0090162215       -5.9338939011        6.1931167954
Y        18.4748060757       -6.4905073540        5.6656446036
Y        19.2825591449       -6.4479943255        7.0179774953
X        11.0203415273       34.6029396705        2.7220660957
Y        11.1184002007       34.8398120338        1.8089008500
Y        10.3349649622       33.9509485292        2.5605794622

Answered By - markp-fuso

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, December 15, 2021

[SOLVED] Find pattern in file and print every xth occurrence to another file

Issue

Solution

Popular Posts

Labels