Issue
Suppose I have a list of URL named URL.txt and I only want the directories to be output not the files or extensions such as .html, .php etc. And if it finds any extension or any file in the URL the script should move on to the next URL
- https://example.com/tradings/trade/trading?currency=usdt&dest=btc&tab=limit
- https://example.com/account/signup/accounts/signin/account.html
I want results like this:
- https://example.com/tradings/
- https://example.com/tradings/trade/
- https://example.com/account/
- https://example.com/account/signup/
- https://example.com/account/signup/accounts/
- https://example.com/account/signup/accounts/signin/
I tried this command but it won't convert into a complete URL endpoint. I want a complete URL endpoint without any extension.
cat Urls.txt | rev | cut -d'/' -f 2 | sort -u | rev
Solution
Perl to the rescue!
perl -lne '@parts = split m{/}; print join "/", @parts[0 .. $_] for 3 .. $#parts - 1' < URL.txt
-n
reads the input line by line and runs the code for each line-l
removes newlines from input and adds them toprint
- Each line is split on
/
. We then reconnect the parts starting from 3 up to the last but one part. - See split and join for more details.
Answered By - choroba Answer Checked By - Terry (WPSolving Volunteer)