Sunday, October 9, 2022

[SOLVED] How grab urls from text file which have .com .org .net domain with unix command

Issue

I need a regex for copying domain names from text files. In text files, the domains look like

site.com   - org name - title
site.net   - other name - another title
HTTP://target.ca - ca site - ca title

From this text file I need

site.com
site.net
target.ca

I try sed 's/\.com\/.*/.com/' file.txt but this command only give me .com domain but I need all the domain name. Pls, help me out.

Thank you.


Solution

1st solution: With your shown samples, please try following awk code. Simple explanation would be, setting field separator as space OR / for all the lines and in main block of awk program checking if line starts with HTTP: then print 3rd field else print 1st field to get required values as per requirements.

awk -F' |/' '/^HTTP:/{print $3;next} {print $1}' Input_file


2nd solution: Using sed please try following code. Using -E option of -E to enable ERE(extended regular expressions) and capturing group capability of sed here. Here is the Online demo for used regex in sed code.

sed -E 's/^(HTTP:\/\/)?([^[:space:]]+).*$/\2/'  Input_file


3rd solution: Using GNU grep here with its \K option which allows us to match things with regex and forget/neglect them while printing. Here is the Online demo for used regex in grep solution.

grep -oP '^(HTTP:\/\/)?\K([^[:space:]]+)'  Input_file


Answered By - RavinderSingh13
Answer Checked By - Robin (WPSolving Admin)