Thursday, October 28, 2021

[SOLVED] Saving files to separate html file before using grep/sed

Issue

I'm working on a project that lets me navigate some urls. Right now I have:

#!/bin/bash
for file in $1
do
wget $1 >> output.html
cat output.html | grep -o '<a .*href=.*>' | 
sed -e 's/<a /\n<a /g' |
sed -e 's/<a .*href=['"'"'"]//' -e 's/["'"'"'].*$//' -e '/^$/ d' |
grep 'http'
done

I want the user to be able to run the script as follows:

./navigator google.com

which will save the source of url into a new html file, which will then run my grep/seds and then save to a new file.

Right now I'm struggling with saving the url into a new html file. Help!


Solution

To create a new file for each URL, use url in your output filename for wget -O option:

#!/bin/bash

for url; do
   out="output-$url.html"
   wget -q "$url" -O "$out"

   grep -o '<a .*href=.*>' "$out" | 
     sed -e 's/<a /\n<a /g' |
     sed -e 's/<a .*href=['"'"'"]//' -e 's/["'"'"'].*$//' -e '/^$/ d' |
     grep 'http'
done

PS: As per comments above, added -q in wget to make it totally quiet.



Answered By - anubhava