Issue
I need to verify that all images mentioned in a csv are present inside a folder. I wrote a small shell script for that
#!/bin/zsh
red='\033[0;31m'
color_Off='\033[0m'
csvfile=$1
imgpath=$2
cat $csvfile | while IFS=, read -r filename rurl
do
if [ -f "${imgpath}/${filename}" ]
then
echo -n
else
echo -e "$filename ${red}MISSING${color_Off}"
fi
done
My CSV looks something like
Image1.jpg,detail-1
Image2.jpg,detail-1
Image3.jpg,detail-1
The csv was created by excel.
Now all 3 images are present in imgpath
but for some reason my output says
Image1.jpg MISSING
Upon using zsh -x
to run the script i found that my CSV file has a BOM at the very beginning making the image name as \ufeffImage1.jpg
which is causing the whole issue.
How can I ignore a BOM(byte-order marker) in a while read operation?
Solution
zsh provides a parameter expansion (also available in POSIX shells) to remove a prefix: ${var#prefix}
will expand to $var
with prefix
removed from the front of the string.
zsh also, like ksh93 and bash, supports ANSI C-like string syntax: $'\ufeff'
refers to the Unicode sequence for a BOM.
Combining these, one can refer to ${filename#$'\ufeff'}
to refer to the content of $filename
but with the Unicode sequence for a BOM removed if it's present at the front.
The below also makes some changes for better performance, more reliable behavior with odd filenames, and compatibility with non-zsh shells.
#!/bin/zsh
red='\033[0;31m'
color_Off='\033[0m'
csvfile=$1
imgpath=$2
while IFS=, read -r filename rurl; do
filename=${filename#$'\ufeff'}
if ! [ -f "${imgpath}/${filename}" ]; then
printf '%s %bMISSING%b\n' "$filename" "$red" "$color_Off"
fi
done <"$csvfile"
Notes on changes unrelated to the specific fix:
- Replacing
echo -e
withprintf
lets us pick which specific variables get escape sequences expanded:%s
for filenames means backslashes and other escapes in them are unmodified, whereas%b
for$red
and$color_Off
ensures that we do process highlighting for them. - Replacing
cat $csvfile |
with< "$csvfile"
avoids the overhead of starting up a separatecat
process, and ensures that yourwhile read
loop is run in the same shell as the rest of your script rather than a subshell (which may or may not be an issue for zsh, but is a problem with bash when run without the non-defaultlastpipe
flag). echo -n
isn't reliable as a noop: some shells print-n
as output, and the POSIXecho
standard, by marking behavior when-n
is present as undefined, permits this. If you need a noop,:
ortrue
is a better choice; but in this case we can just invert the test and move theelse
path into the truth path.
Answered By - Charles Duffy Answer Checked By - Katrina (WPSolving Volunteer)