Saturday, October 29, 2022

[SOLVED] Search for part of string with grep in all files in folder and subfolders

Issue

I have .html files in directories and subdirectories. I need to extract all strings that starts with "domain.com". Part of string can look like this:

["https://example.com/folder1",
href="https://example.com/anotherfolder2" target="
etc.

What I want to extract is: folder1
anotherfolder2
etc.

from all files in all folders to one list, each word - new line.

Found some examples on StackOverflow with many likes, but not worked. I tried like this (from some examples):

grep -Po '(?<=example.com=)[^,]*'

Thank you for help!


Solution

grep "example.com" your-directory -r | grep -o '".*"' | cut -d \" -f2| sed -e 's/https:\/\/example.com\///g'
  1. grep "example.com" your-directory -r | grep -o '".*"' your-directory -r | cut -d \" -f2 extracts the content of quoted string
  2. sed -e 's/https:\/\/example.com\///g' get the suffix of https://example.com/


Answered By - ramsay
Answer Checked By - Senaida (WPSolving Volunteer)