Issue
I use nohup to run a bash script to read in each line of a file (and extract info I need). I've used it on multiple files with different line sizes, mostly between 50k and 100k. But no matter how many lines my file is, nohup always stops outputting info around 6k before the last line.
my script called: fetchStuff.sh
#!/bin/bash
urlFile=$1
myHost='http://example.com'
useragent='me'
count=0
total_lines=$(wc -l < $urlFile)
while read url; do
if [[ "$url" == *html ]]; then continue; fi
reqURL=${myHost}${url}
stuffInfo=$(curl -s -XGET -A "$useragent" "$reqURL" | jq -r '.stuff')
[ "$stuffInfo" != "null" ] && echo ${stuffInfo/unwanted_garbage/} >> newversion-${urlFile}
((count++))
if [ $(( $count%20 )) -eq 0 ]
then
sleep 1
fi
if [ $(( $count%100 )) -eq 0 ]; then echo "$urlFile read ${count} of $total_lines"; fi
done < $urlFile
I call it like so: nohup ./fetchStuff.sh file1.txt &
I get count info in nohup.out, e.g. "file1 read 100 of 60000", "file1 read 200 of 60000", etc.
But it always stops around 6k before end of file.
When I do tail nohup.out
each time after running the script on the file, I get these as the last line in nohup.out:
file1.txt read 90000 of 96317
file2.txt read 68000 of 73376
file3.txt read 85000 of 91722
file4.txt read 93000 of 99757
I can't figure out why it always stops around 6k before end of file. (I put the sleep timer in to avoid flooding the api w/a lot of requests).
Solution
The loops skips lines that end with html
, and they're not counted in $count
. So I'll bet there are 6317 lines in file1.txt
that end with html
, 5376 in file2.txt
, and so on.
If you want $count
to include them, put ((count++))
before the if
statement that checks the suffix.
while read url; do
((count++))
if [[ "$url" == *html ]]; then continue; fi
reqURL=${myHost}${url}
stuffInfo=$(curl -s -XGET -A "$useragent" "$reqURL" | jq -r '.stuff')
[ "$stuffInfo" != "null" ] && echo ${stuffInfo/unwanted_garbage/} >> newversion-${urlFile}
if [ $(( $count%20 )) -eq 0 ]
then
sleep 1
fi
if [ $(( $count%100 )) -eq 0 ]; then echo "$urlFile read ${count} of $total_lines"; fi
done < $urlFile
Alternatively you could leave them out of total_lines
with:
total_lines=$(grep -c -v 'html$' "$urlFile")
And you could do away with the if
statement by using
grep -v 'html$' "$urlFile" | while read url; do
...
done
Answered By - Barmar Answer Checked By - Pedro (WPSolving Volunteer)