Issue
I'm trying to parse a couple of 2gb+ files and want to grep on a couple of levels.
Say I want to fetch lines that contain "foo" and lines that also contain "bar".
I could do grep foo file.log | grep bar
, but my concern is that it will be expensive running it twice.
Would it be beneficial to use something like grep -E '(foo.*bar|bar.*foo)'
instead?
Solution
grep -E '(foo|bar)'
will find lines containing 'foo' OR 'bar'.
You want lines containing BOTH 'foo' AND 'bar'. Either of these commands will do:
sed '/foo/!d;/bar/!d' file.log
awk '/foo/ && /bar/' file.log
Both commands -- in theory -- should be much more efficient than your cat | grep | grep
construct because:
- Both
sed
andawk
perform their own file reading; no need for pipe overhead - The 'programs' I gave to
sed
andawk
above use Boolean short-circuiting to quickly skip lines not containing 'foo', thus testing only lines containing 'foo' to the /bar/ regex
However, I haven't tested them. YMMV :)
Answered By - pepoluan