Issue
I want to mirror Wikipedia pages with wget Linux command I used this command
wget --mirror -p --convert-links -P ./folder-mirror /https://en.wikipedia.org/wiki/Portal:Contents/A–Z_index
but i only get this file robots.txt
Solution
Robot exclusion is on by default in wget
to keep folks from being jerks and recursively gobbling up someone else's web page and their bandwidth with it.
You can turn it off in your .wgetrc
file, or you use wget's -e
switch like: -e robots=off
This isn't to say that Wikipedia doesn't have further safe guards in place to insure that your wget doesn't recursively download everything, but it will keep wget from honoring robots.txt and meta.
If you still hit the wall, then perhaps tinkering with the user-agent or something along those lines.
Answered By - JNevill Answer Checked By - Senaida (WPSolving Volunteer)