Issue
So I'm trying to figure out a way to search through data more easily, and currently I have a grep command to do that for me. However, this grep command is not perfect and I'm trying to figure out if it can be improved.
Say we have the following lines of text in some files in the dir where im grepping that has random alphanumeric strings in them that may or may not have spaces:
2001 abc20abcdef
abcd2012 a20abcdef abcdefg
2006 21abcdef
abc2021 abcde abc18abcd
ab2015ababcd20ababcd
Let's also assume that the numbers in these strings will only ever come in as two digits except for when a year is included in the string. So a string can be 100 characters long for example, but there will only ever be two characters that are numbers in that string, unless there is a year in which case there would be 6 number characters in the string. The year will never be next to the target number, so a string will never contain abc201820abc for example.
For the sake of this example, I then want to return the lines that have 20 in them unless they look like a year. If there is both a year and a 20 in the same line, then I do want to return that line, but not if there is only a year without a 20. So for example, I'd like to return:
2001 abc20abcdef
abcd2012 a20abcdef abcdefg
ab2015ababcd20ababcd
but NOT return:
2006 21abcdef
abc2021 abcde abc18abcd
My current grep is very basic and will simply return all lines that have a 20 in them, which is technically what I want but gives me useless lines as well as useful ones. How can I narrow this down?
Current grep:
grep -rn 20 .
This would return all 5 lines, which is 3 lines that I wanted and 2 that I didn't want.
I have some pseduocode logic below that would give me what I want, but I dont know how to turn that into a grep/script:
for each line in files {
if (line contains the number 20 three times) // for example abc2020abcde20abc
add line to results;
if (line contains the number 20 twice and both 20s are not immediately next to each other) // This will avoid a false hit of the year 2020
add line to results;
else if (line contains the number 20 once) {
if (an alphabetic character or whitespace follows the 20)
add line to results;
else
do not add line to results;
}
}
Any thoughts? All help/opinions would be appreciated!
EDIT: I thought of an even better pseudocode, but still don't know how to turn it into a grep:
for each line in files {
if (line contains an instance where the number 20 has only alphabetic characters or whitespace on either side of it)
add line to results;
else
do not add line to results;
}
Solution
line contains an instance where the number 20 has only alphabetic characters or whitespace on either side of it
Translates into
grep -Ei '[a-z \t]20[a-z \t]'
But you may want to use the following instead, which also prints lines that contain 20
at the start or end of the line or next to a punctuation symbol.
grep -E '(^|[^0-9])20([^0-9]|$)'
Answered By - Socowi Answer Checked By - Timothy Miller (WPSolving Admin)