Thursday, November 18, 2021

[SOLVED] How can one dynamically create a new csv from selected columns of another csv file?

Issue

I dynamically iterate through a csv file and select columns that fit the criteria I need. My CSV is separated by commas. I save these indexes to an array that looks like

echo "${cols_needed[@]}"

1 3 4 7 8

I then need to write these columns to a new file and I've tried the following cut and awk commands, however, as the array is dynamically created, I cant seem to find the right commands that can select them all at once. I have tried cut, awk and paste commands.

awk -v fields=${cols_needed[@]} 'BEGIN{ n = split(fields,f) }
    { for (i=1; i<=n; ++i) printf "%s%s", $f[i], (i<n?OFS:ORS) }' test.csv

This throws an error as it cannot split the fields unless I hard code them (even then, it can only do 2), split on spaces.

fields="1 2’

I have tried to dynamically create -f parameters, but can only do so with one variable in a loop like so

for item in "${cols_needed[@]}";
    do
       cat test.csv | cut -f$item
    done

which outputs one column at a time.

And I have tried to dynamically create it with commas - input as 1,3,4,7...

cat test.csv | cut -f${cols_needed[@]};

which also does not work!

Any help is appreciated! I understand awk does not work like bash and we cannot pass variables around in the same way. I feel like I'm going around in circles a bit! Thanks in advance.


Solution

Your first approach is ok, just:

  • change -v fields=${cols_needed[@]} to -v fields="${cols_needed[*]}", to pass the array as a single shell word
  • add FS=OFS="," to BEGIN, after splitting (you want to split on spaces, before FS is changed to ,)
  • ie. BEGIN {n = split(fields, f); FS=OFS=","}

Also, if there are no commas embedded in quoted csv fields, you can use cut:

IFS=,; cut -d, -f "${cols_needed[*]}" test.csv

If there are embedded commas, you can use gawk's FPAT, to only split fields on unquoted commas.

Here's an example using that.

# prepend $ to each number
for i in "${cols_needed[@]}"; do
    fields[j++]="\$$i"
done

IFS=,

gawk -v FPAT='([^,]+)|(\"[^\"]+\")' -v OFS=, "{print ${fields[*]}}"

Injecting shell code in to an awk command is generally not great practice, but it's ok here IMO.



Answered By - dan