Issue

I have .html files in directories and subdirectories. I need to extract all strings that starts with "domain.com". Part of string can look like this:

["https://example.com/folder1",
href="https://example.com/anotherfolder2" target="
etc.

What I want to extract is: folder1
anotherfolder2
etc.

from all files in all folders to one list, each word - new line.

Found some examples on StackOverflow with many likes, but not worked. I tried like this (from some examples):

grep -Po '(?<=example.com=)[^,]*'

Thank you for help!

Solution

grep "example.com" your-directory -r | grep -o '".*"' | cut -d \" -f2| sed -e 's/https:\/\/example.com\///g'

grep "example.com" your-directory -r | grep -o '".*"' your-directory -r | cut -d \" -f2 extracts the content of quoted string
sed -e 's/https:\/\/example.com\///g' get the suffix of https://example.com/

Answered By - ramsay

Answer Checked By - Senaida (WPSolving Volunteer)