Issue
I need to count the number of occurrences of a string inside of a log file using bash and execute a command once the string repeats itself more than 5 times.
I have the following sample data from the log file:
[10:35:56] world_log_event: kick (starrr)(NormieBL)@Arca from srv 192.168.1.6(21)
[10:39:17] world_log_data: user (chrisxJ02)(Delaon)@Arca is already connected on srv 7
[10:39:23] world_log_event: kick (chrisxJ02)(Delaon)@Arca from srv 192.168.1.39(7)
[10:39:17] world_log_data: user (test01)(testDW)@Arca is already connected on srv 39
Some examples of how the script should behave:
if string "is already connected on srv 21" count is =>5 times then "exec command telnet 192.168.1.6"
if string "is already connected on srv 7" count is =>5 times then "exec command telnet 192.168.1.39"
Solution
Assumptions and collection of OP's comments:
- user provides a string to search on (eg,
Lorem ipsum dolor sit amet
) - of interest is the field that follows this string (eg,
21
), aka##
; sample data shows this will always be a single field - keep count of the number of times each of these
##
shows up in the input - user provides a threshold on the number of matches to report on (eg, OP has mentioned
5
in the question), akathreshold
- the only occurences of the string
IPaddr_
are in the lines of interest ... - keep track of
IPaddr_?
fields and the associated##
(eg,IPaddr_?(##)
) - there will always be at least one
IPAddr_?(##)
in the input - per OP's (updated) sample data the
IPaddr_?(##)
entry is always the last field in a (white) space delimited input line; for sake of completeness we'll assume there could be other (white) spaced delimited fields after theIPaddr_?(##)
we're interested in - if
##
has multiple matchingIPaddr_?
records [NOTE: OP has stated this scenario does not occur], the proposedawk
solution (below) will report the lastIPaddr_?
read from the input - at the end of processing if a
##
shows up at leastthreshold
times then print the##
and the associatedIPaddr_?
; OP hasn't provided a desired output format so for now we'll assume## IPaddr_?
is sufficient for the calling process to parse
Input parameters set by user:
search_string='Lorem ipsum dolor sit amet'
threshold=5
One awk
idea:
awk -v ss="${search_string}" -v threshold="${threshold}" '
$0 ~ ss { counter[$NF]++ } # counter[##]++
/ IPaddr_/ { for (i=2; i<=NF; i++) # loop through fields ...
if ($(i) ~ "IPaddr_") { # looking for string "IPaddr_"
split($(i),arr,"[()]") # split "IPaddr_?(##)" on parens
ip[arr[2]]=arr[1] # ip[##]=IPaddr_?
next} # skip to next input line
}
END { for (i in counter) # for every "##" encountered ...
if (counter[i] >= threshold) # if the count is greater than threshold then ...
print i,ip[i] # print "## IPaddr_?"
}
' ipsum.log
Using OP's sample input this generates:
21 IPaddr_B
For threshold=3
this generates:
21 IPaddr_B
For threshold=2
this generates:
17 IPaddr_A
21 IPaddr_B
22 IPaddr_C
Answered By - markp-fuso Answer Checked By - Willingham (WPSolving Volunteer)