Issue
I have .html files in directories and subdirectories. I need to extract all strings that starts with "domain.com". Part of string can look like this:
["https://example.com/folder1",
href="https://example.com/anotherfolder2" target="
etc.
What I want to extract is:
folder1
anotherfolder2
etc.
from all files in all folders to one list, each word - new line.
Found some examples on StackOverflow with many likes, but not worked. I tried like this (from some examples):
grep -Po '(?<=example.com=)[^,]*'
Thank you for help!
Solution
grep "example.com" your-directory -r | grep -o '".*"' | cut -d \" -f2| sed -e 's/https:\/\/example.com\///g'
grep "example.com" your-directory -r | grep -o '".*"' your-directory -r | cut -d \" -f2
extracts the content of quoted stringsed -e 's/https:\/\/example.com\///g'
get the suffix ofhttps://example.com/
Answered By - ramsay Answer Checked By - Senaida (WPSolving Volunteer)