Issue
i have two lists or urls
first listofdomains.txt
contains as following
http://example.com
https://www.example.com
https://abc-test.example.com
second urls_params.txt
contains as following
http://example.com/?param1=123
http://example.com/?param1=123¶m2=456
https://www.example.com/?param1=123
https://www.example.com/?param1=123¶m2=456
https://abc-test.example.com/?param1=123
https://abc-test.example.com/?param1=123¶m2=456
i need to loop between two lists to grep from urls_params.txt
all urls belong to every subdomains and save it with subdomain name.txt
for example the desired output would be
file named example.com
and contains
http://example.com/?param1=123
http://example.com/?param1=123¶m2=456
and so on for the rest of subdomains
my solution which is did not work is filter listofdomains.txt
list to be only as
example.com
www.example.com
abc-test.example.com
and save it in file named list
then executing following command
while read -r url; do $(cat urls_params.txt | awk -v u="$url" '{print u}') ; done < list
BUT the output is error
example.com: command not found
www.example.com: command not found
abc-test.example.com: command not found
Thanks
Solution
Input (from the question):
$ ls
listofdomains.txt tst.awk urls_params.txt
Script:
$ cat tst.awk
{
dom = $0
sub("https?://","",dom)
sub("/.*","",dom)
}
NR==FNR {
dom2urls[dom] = dom2urls[dom] $0 ORS
next
}
dom != prev {
close(out)
out = dir "/" dom
prev = dom
}
{ printf "%s", dom2urls[dom] > out }
Execute it:
$ awk -v dir="$PWD" -f tst.awk urls_params.txt listofdomains.txt
Output:
$ ls
abc-test.example.com example.com listofdomains.txt tst.awk urls_params.txt www.example.com
$ head *.com
==> abc-test.example.com <==
https://abc-test.example.com/?param1=123
https://abc-test.example.com/?param1=123¶m2=456
==> example.com <==
http://example.com/?param1=123
http://example.com/?param1=123¶m2=456
==> www.example.com <==
https://www.example.com/?param1=123
https://www.example.com/?param1=123¶m2=456
You don't actually need listofdomains.txt
unless there some domains you want to exclude from the output or some domains not included in urls_params.txt
that you want to get empty output files for.
If you only want output files created for domains that have entries in the urls_params.txt
file (i.e. no empty output files) then just change:
{ printf "%s", dom2urls[dom] > out }
to:
dom in dom2urls { printf "%s", dom2urls[dom] > out }
Answered By - Ed Morton