Issue
I have around 6,000 smallish text files. Some have only 3 or 4 lines and a few might have 100 or more lines. I thought I would merge them into 1 large file to make reading them easier. A Windows batch file did the merge, adding a "=======' line between each merged file, but the new file is about 50MB in size, with 900,000 lines. Too big. Would like to split them into fifty files, around 1MB each. The split programs I have looked at either split by exact size or by lines. But I don't want a particular text file split between two chunks. So in the example below, I don't want one chunk to end with "Brown fox" and the next one to start with Jumps. In other words, treat everything between the ======= separator as an unbreakable unit.
This is a Windows/DOS file, so there is no need to change the CRLF line endings. This file does not have any special coding for printing, coloring, etc.
Merged file example:
=======
One
Two
=======
Abc
=======
The quick
Brown Fox
Jumps
Over the dog
=======
Dfdfasdf
Eeffee
Eewweew
Lk klkl Y tyyd
=======
I typed this string on a Windows command line to create the 50MB file All.asc
For %A in (D:\@temp\*.txt) Do @(CAT53 -s %A & Echo =======) >>D:\@temp\All.asc
When I ran this command, (specifying 30 bytes)
split -b30 all.asc BB
The output for the second file (BBab) elooked like this:
==
Abc
=======
The qu
I didn't think checking the size of the all.asc file after each concatenation and aborting if the size exceeded 1MB would be very efficient. I thought a solution involving merging and then splitting would be simplier and could be reused.
I have the unix utilities on my PC, but not sure if sed or awk or split would be useful. The GSplit utility doesn't seem to do what I need.
Solution
This Batch file do exactly what you want. Just set the desired output files size in partSize
variable.
@echo off
setlocal EnableDelayedExpansion
set /A partSize=30, part=101, last=0
del part*.txt 2> NUL
echo Creating part # %part:~1%
< all.asc (
for /F "delims=:" %%n in ('findstr /N /B "=======" all.asc') do (
set /A "lines=%%n-last, last=%%n"
(for /L %%i in (1,1,!lines!) do (
set "line="
set /P "line="
echo(!line!
)) >> part!part:~1!.txt
for %%f in (part!part:~1!.txt) do (
if %%~Zf gtr %partSize% (
set /A part+=1
echo Creating part # !part:~1!
)
)
))
Answered By - Aacini Answer Checked By - Clifford M. (WPSolving Volunteer)