Issue
I need to get the number of four letter words using the grep command in the Linux shell. My idea was to create a list of four letter words and then use a pipe with | wc -l
.
I'm pretty new to Linux, but I have tried the following:
cat your_file | grep -c '^[ \t]*[a-zA-Z]\{5\}[ \t]*$'
and
grep -o -w "\w\{5\}" your_file
Solution
Use this Perl one-liner:
perl -lne 'print for /\b([A-Za-z]{4})\b/g' in_file
Example:
echo 'ABCD abcd abcd1 abcd_ Abcd,Abcd.' | perl -lne 'print for /\b([A-Za-z]{4})\b/g'
Output:
ABCD
abcd
Abcd
Abcd
The Perl one-liner uses these command line flags:
-e
: Tells Perl to look for code in-line, instead of in a file.
-n
: Loop over the input one line at a time, assigning it to $_
by default.
-l
: Strip the input line separator ("\n"
on *NIX by default) before executing the code in-line, and append it when printing.
[A-Za-z]{4}
: Any 4 letter word = a letter, uppercase or lowercase, exactly 4 occurrences.
([A-Za-z]{4})
: The above, parenthesis used to capture the 4 letter word.
\b([A-Za-z]{4})\b
: The above, flanked by a word boundary \b
on both sides, which makes it a separate word.
print for /(...)/g
: iterate over the captured patterns and print all occurrences.
The regex uses this modifier:
/g
: Multiple matches.
SEE ALSO:
perldoc perlrun
: how to execute the Perl interpreter: command line switches
perldoc perlre
: Perl regular expressions (regexes)
perldoc perlre
: Perl regular expressions (regexes): Quantifiers; Character Classes and other Special Escapes; Assertions; Capture groups
perldoc perlrequick
: Perl regular expressions quick start
Answered By - Timur Shtatland