Issue
I have malformed csv file that has different number of delimiters (delimiter is comma). I need to split the source file into multiple files, based on number of delimiters.
$ cat ~/Desktop/sample_file.csv
1001,Pink,Panther,Seattle,WA,98001,favorite character
1002,Micky,Mouse,Bellevue,WA,98002
1003,Mini,Mouse,Bellevue,WA,98003
1004,Donald,Duck,Redmond,WA,98004,So funny
1005,Tom,Jerry,Lynwood,WA
1006,Woody,Woodpacker,Lynwood,WA
Need unix command to split it into multiple files based on number of delimiters. For example, above sample file needs to be broken down into 3 files:
$ cat sample_file_6_delimiter.csv
1001,Pink,Panther,Seattle,WA,98001,favorite character
1004,Donald,Duck,Redmond,WA,98004,So funny
$ cat sample_file_5_delimiter.csv
1002,Micky,Mouse,Bellevue,WA,98002
1003,Mini,Mouse,Bellevue,WA,98003
$ cat sample_file_4_delimiter.csv
1005,Tom,Jerry,Lynwood,WA
1006,Woody,Woodpacker,Lynwood,WA
Solution
One awk
idea:
awk '
BEGIN { FS = OFS = "," } # set input/output field separator as ","; setting OFS is optional in this case since we always output the entire line (print $0)
FNR==1 { basename = FILENAME # 1st record of file: make copy of input FILENAME
sub(/.csv$/,"",basename) # strip ".csv" off end of filename
}
{ print $0 > (basename "_" (NF-1) "_delimiter.csv") } # "NF" == number of fields, "NF-1" == number of delimiters
' sample_file.csv
This generates:
$ head sample*delimiter.csv
==> sample_file_4_delimiter.csv <==
1005,Tom,Jerry,Lynwood,WA
1006,Woody,Woodpacker,Lynwood,WA
==> sample_file_5_delimiter.csv <==
1002,Micky,Mouse,Bellevue,WA,98002
1003,Mini,Mouse,Bellevue,WA,98003
==> sample_file_6_delimiter.csv <==
1001,Pink,Panther,Seattle,WA,98001,favorite character
1004,Donald,Duck,Redmond,WA,98004,So funny
Answered By - markp-fuso Answer Checked By - Mary Flores (WPSolving Volunteer)