Saturday, October 29, 2022

[SOLVED] While-loop subshell dilemma in Bash

Issue

i want to compute all *bin files inside a given directory. Initially I was working with a for-loop:

var=0
for i in *ls *bin
do
   perform computations on $i ....
   var+=1
done
echo $var

However, in some directories there are too many files resulting in an error: Argument list too long

Therefore, I was trying it with a piped while-loop:

var=0
ls *.bin | while read i;
do
  perform computations on $i
  var+=1
done
echo $var

The problem now is by using the pipe subshells are created. Thus, echo $var returns 0.
How can I deal with this problem?
The original Code:

#!/bin/bash

function entropyImpl {
    if [[ -n "$1" ]]
    then
        if [[ -e "$1" ]]
        then
            echo "scale = 4; $(gzip -c ${1} | wc -c) / $(cat ${1} | wc -c)" | bc
        else
            echo "file ($1) not found"
        fi
    else
        datafile="$(mktemp entropy.XXXXX)"
        cat - > "$datafile"
        entropy "$datafile"
        rm "$datafile"
    fi

    return 1
}
declare acc_entropy=0
declare count=0

ls *.bin | while read i ;
do  
    echo "Computing $i"  | tee -a entropy.txt
    curr_entropy=`entropyImpl $i`
    curr_entropy=`echo $curr_entropy | bc`  
    echo -e "\tEntropy: $curr_entropy"  | tee -a entropy.txt
    acc_entropy=`echo $acc_entropy + $curr_entropy | bc`
    let count+=1
done

echo "Out of function: $count | $acc_entropy"
acc_entropy=`echo "scale=4; $acc_entropy / $count" | bc`

echo -e "===================================================\n" | tee -a entropy.txt
echo -e "Accumulated Entropy:\t$acc_entropy ($count files processed)\n" | tee -a entropy.txt

Solution

The problem is that the while loop is part of a pipeline. In a bash pipeline, every element of the pipeline is executed in its own subshell [ref]. So after the while loop terminates, the while loop subshell's copy of var is discarded, and the original var of the parent (whose value is unchanged) is echoed.

One way to fix this is by using Process Substitution as shown below:

var=0
while read i;
do
  # perform computations on $i
  ((var++))
done < <(find . -type f -name "*.bin" -maxdepth 1)

Take a look at BashFAQ/024 for other workarounds.

Notice that I have also replaced ls with find because it is not good practice to parse ls.



Answered By - dogbane
Answer Checked By - Katrina (WPSolving Volunteer)