Wednesday, August 31, 2022

[SOLVED] optimize multiple sed commands in shell script

Issue

I have a folder containing many text files with json content in it. With jq library, I am able extract the "commodities" array and write it to a file. The "commodities-output.txt" is a temp file that contains brackets "[", "]" and "null" values apart from the string values in the array. I want to remove the square brackets, "null" value and get the unique string values in a text file. Is there a way to optimise the sed command so that I don't have to create temporary text files such as "commodities-output.txt" and only have one output file with all the string values I need that are uniq and sorted(optional to be sorted).

$F=foldername
for entry in $F*.json
do
  echo "processing $entry"
  jq '.[].commodities' $entry >> commodities-output.txt
done
sed '/[][]/d' commodities-output.txt | sed '/null/d' commodities-output.txt | sort commodities-output.txt | uniq >> commodities.txt

echo "processing complete!"

Solution

You can easily do all of this in jq.

files=( "$F"*.json )
echo "$0: processing ${files[0]}" >&2
jq '.[] | select(.commodities != [] and .commodities != null) | .commodities' "${files[0]}"

I refactored to use a Bash array to get the first of the matching files.

If for some reason you can't refactor your code to run entirely in jq, you definitely want to prefer pipes over temporary files.

for entry in "$F"*.json
do
  echo "$0: processing $entry" >&2
  jq '.[].commodities' "$entry"
  break
done |
sed -e '/[][]/d' -e '/null/d' |
sort -u > commodities.txt

Notice also how we take care to print the progress diagnostics to standard error (>&2) and include the name of the script in the diagnostic message. That way, when you have scripts running scripts running scripts, you can see which one wants your attention.

Also, When to wrap quotes around a shell variable



Answered By - tripleee
Answer Checked By - Robin (WPSolving Admin)