Issue
I have a very large file that is similar to the snippet below. The snippet shows three blocks of data. They are from three distinct time steps (i).
6 # <--This is the same for all data blocks (i.e., always 6 rows of data)
i = 0, time = 0.000, k = 9000000000000
X -7.6415350292 6.0494971539 8.1919697993
Y -6.6418362233 5.9231018862 8.4056822626
Y -8.0518670684 6.3158684817 9.0061271154
X 26.8252967820 20.4661074967 17.8025744066
Y 26.4477411207 20.4071029058 16.9121571912
Y 26.4399648474 21.2950722068 18.1009273227
6
i = 1, time = 0.500, k = 2500000000000
X -6.2423192714 -1.5704681396 -9.5648670474
Y -5.4925100813 -1.6522059045 -8.9030589772
Y -6.7765278574 -2.3616512405 -9.4776648590
X 4.1248924594 27.8487302083 -17.5400886312
Y 4.1238657681 26.9869907778 -17.9727402579
Y 5.0750649402 28.1292768156 -17.6848507559
6
i = 2, time = 1.000, k = 3945000000000
X 19.0090162215 -5.9338939011 6.1931167954
Y 18.4748060757 -6.4905073540 5.6656446036
Y 19.2825591449 -6.4479943255 7.0179774953
X 11.0203415273 34.6029396705 2.7220660957
Y 11.1184002007 34.8398120338 1.8089008500
Y 10.3349649622 33.9509485292 2.5605794622
- I would like to print every 100th data block to a new file.
The answer from @potong at the link below looks promising (if I understand, the other answers depend on a blank line between the data blocks and I don't have one). I have managed to use it to print every distinct block to its own file. But I end up with too many files. If anyone knows how to adapt potong's method so that it only works on every xth block, I would be very grateful for a hint.
Find specific pattern and print complete text block using awk or sed
If I do this, I need to make a similar modification to a second (corresponding) file that looks like this:
0 0.000 13.6600000000 0.0000000000 0.0000000000 0.0000000000 13.6600000000 0.0000000000 0.0000000000 0.0000000000 13.6600000000 2548
This is the first row. The first two columns correspond to i = 0, time = 0.000
in the first data block above. I need to print this row and every xth row to a new file so that I have two files with data from the same time steps.
I can think of ways put every xth row in a new file, but if there's a way to make sure the first two columns match i = 0, time = 0.000
above in (1), that would be great to know (so that I don't end up with a mismatch if a row has failed to print or a time step has been repeated in the file).
I have added an "Awk" tag because it seems like this is something Awk might be able to do, but I am not experienced in Awk.
Solution
NOTE: only addressing OP's first requirement of printing every 100th block to a separate/new file ...
Assumptions:
- each block consists of 8 lines (the standalone
6
, thei = ...
line, and 6 data lines) - each 8-line block of interest to be dumped to a separate output file
- output file name format:
block.<block_count>.dat
(OP can change per requirement)
Sample data:
$ cat block.dat
6 # block #1
i = 0, time = 0.000, k = 9000000000000
X -7.6415350292 6.0494971539 8.1919697993
Y -6.6418362233 5.9231018862 8.4056822626
Y -8.0518670684 6.3158684817 9.0061271154
X 26.8252967820 20.4661074967 17.8025744066
Y 26.4477411207 20.4071029058 16.9121571912
Y 26.4399648474 21.2950722068 18.1009273227
6 # block #2
i = 1, time = 0.500, k = 2500000000000
X -6.2423192714 -1.5704681396 -9.5648670474
Y -5.4925100813 -1.6522059045 -8.9030589772
Y -6.7765278574 -2.3616512405 -9.4776648590
X 4.1248924594 27.8487302083 -17.5400886312
Y 4.1238657681 26.9869907778 -17.9727402579
Y 5.0750649402 28.1292768156 -17.6848507559
6 # block #3
i = 2, time = 1.000, k = 3945000000000
X 19.0090162215 -5.9338939011 6.1931167954
Y 18.4748060757 -6.4905073540 5.6656446036
Y 19.2825591449 -6.4479943255 7.0179774953
X 11.0203415273 34.6029396705 2.7220660957
Y 11.1184002007 34.8398120338 1.8089008500
Y 10.3349649622 33.9509485292 2.5605794622
6 # block #4
i = 2, time = 1.000, k = 3945000000000
X 19.0090162215 -5.9338939011 6.1931167954
Y 18.4748060757 -6.4905073540 5.6656446036
Y 19.2825591449 -6.4479943255 7.0179774953
X 11.0203415273 34.6029396705 2.7220660957
Y 11.1184002007 34.8398120338 1.8089008500
Y 10.3349649622 33.9509485292 2.5605794622
One awk
idea to print out every xth
block
x=2 # set to 100 per OP's requirement
awk -v x="${x}" '
$1 == "6" { count++ }
!(count % x) { print > "block." count ".dat"}
' block.dat
This generates:
for f in block.*.dat
do
echo "########### $f"
cat $f
done
########### block.2.dat
6 # block #2
i = 1, time = 0.500, k = 2500000000000
X -6.2423192714 -1.5704681396 -9.5648670474
Y -5.4925100813 -1.6522059045 -8.9030589772
Y -6.7765278574 -2.3616512405 -9.4776648590
X 4.1248924594 27.8487302083 -17.5400886312
Y 4.1238657681 26.9869907778 -17.9727402579
Y 5.0750649402 28.1292768156 -17.6848507559
########### block.4.dat
6 # block #4
i = 2, time = 1.000, k = 3945000000000
X 19.0090162215 -5.9338939011 6.1931167954
Y 18.4748060757 -6.4905073540 5.6656446036
Y 19.2825591449 -6.4479943255 7.0179774953
X 11.0203415273 34.6029396705 2.7220660957
Y 11.1184002007 34.8398120338 1.8089008500
Y 10.3349649622 33.9509485292 2.5605794622
Answered By - markp-fuso