Issue
I have the following text:
aaa rr tt zz pp
aaa pp xx yy uu zz
And need to extract all 'aaa', 'zz' and 'xx' pattern and print them on one line like this:
aaa zz
aaa xx zz
Best I found is grep -oP 'aaa|xx|zz'
but this return each pattern found on a new line:
aaa
zz
aaa
xx
zz
I tried to add something like tr -d '\n'
but in that case it returns the whole matches on single line which is not what I want.
NB: I need a solution which support regexp with non-greedy regexp as the search patterns would looks like: ^.+?,|,IN:.+?\-|,OUT:.+?-|State.+?[$,]
Solution
You may use
while IFS= read -r line; do
echo $(grep -oP 'aaa|xx|zz' <<< "$line");
done < file
That is,
- Read input file line by line
- Get your matches with the
grep
command per each line - The shell will convert the newlines with spaces as the
$(...)
is not enclosed with double quotes.
If you have specific whitespace inside matches that you want to preserver, consider using
while IFS= read -r line; do
echo "$(grep -oP 'aaa|xx|zz' <<< "$line" | awk '{ printf "%s", $0" "}')";
done < file
This way, you will get per-line matches in a space-separated way. You may use any custom delimiter in the awk
command (after $0
).
Answered By - Wiktor Stribiżew Answer Checked By - Willingham (WPSolving Volunteer)