Issue
I have a sentence (array) and I would like to remove from it all words longer than 8 characters.
Example sentence:
var="one two three four giberish-giberish five giberish-giberish six"
I would like to get:
var="one two three four five six"
So far I'm using this:
echo $var | tr ' ' '\n' | awk 'length($1) <= 6 { print $1 }' | tr '\n ' ' '
Solution above works fine but as you can see I'm replacing space with newline then filtering words and then replacing back newline with space. I'm pretty sure there must be better and more "elegant" solution without swapping space/newline.
Solution
You can use
#!/bin/bash
var="one two three four giberish-giberish five giberish-giberish six"
awk 'BEGIN{RS=ORS=" "} length($0) <= 6' <<< "$var"
# -> one two three four five six
See the online demo.
The BEGIN{RS=ORS=" "}
sets the record input/output separator to a space and length($0) <= 6
only keeps the fields that are equal or shorter than 6 chars.
You can also consider the workarounds with GNU sed
and perl
:
sed -E 's/\s*\S{7,}//g' <<< "$var"
perl -pe 's/\s*\S{7,}//g' <<< "$var"
See this online demo.
A non-GNU sed workaround could look like
sed 's/[[:space:]]*[^[:space:]]\{7,\}//g' <<< "$var"
Here, all occurrences of zero or more whitespace (\s*
, [[:space:]]*
) followed with seven or more non-whitespace chars (\S{7,}
, [^[:space:]]\{7,\}
) are removed.
Answered By - Wiktor Stribiżew Answer Checked By - Gilberto Lyons (WPSolving Admin)