Issue
How to split a large csv file (1GB) into multiple files (say one part with 1000 rows, 2nd part 10000 rows, 3rd part 100000, etc) and preserve the header in each part ?
h1 h2
a aa
b bb
c cc
.
.
12483720 rows
into
h1 h2
a aa
b bb
.
.
.
1000 rows
And
h1 h2
x xx
y yy
.
.
.
10000 rows
Solution
Another awk. First some test records:
$ seq 1 1234567 > file
Then the awk:
$ awk 'NR==1{n=1000;h=$0}{print > n}NR==n+c{n*=10;c=NR-1;print h>n}' file
Explained:
$ awk '
NR==1 { # first record:
n=1000 # set first output file size and
h=$0 # store the header
}
{
print > n # output to file
}
NR==n+c { # once target NR has been reached. close(n) goes here if needed
n*=10 # grow target magnitude
c=NR-1 # set the correction factor.
print h > n # first the head
}' file
Count the records:
$ wc -l 1000*
1000 1000
10000 10000
100000 100000
1000000 1000000
123571 10000000
1234571 total
Answered By - James Brown Answer Checked By - Terry (WPSolving Volunteer)