Issue
I have the following sample data from the log file:
[10:35:52] world_log_data: user (starrr)(NormieBL)@Arca is already connected on srv 21
[10:35:53] world_log_data: user (starrr)(NormieBL)@Arca is already connected on srv 21
[10:35:54] world_log_data: user (starrr)(NormieBL)@Arca is already connected on srv 21
[10:35:54] world_log_data: user (starrr)(NormieBL)@Arca is already connected on srv 21
[10:35:54] world_log_data: user (starrr)(NormieBL)@Arca is already connected on srv 21
[10:35:56] world_log_event: kick (starrr)(NormieBL)@Arca from srv 192.168.1.6(21)
[10:39:17] world_log_data: user (chrisxJ02)(Delaon)@Arca is already connected on srv 7
[10:39:19] world_log_data: user (chrisxJ02)(Delaon)@Arca is already connected on srv 7
[10:39:23] world_log_event: kick (chrisxJ02)(Delaon)@Arca from srv 192.168.1.39(7)
[10:39:17] world_log_data: user (test01)(testDW)@Arca is already connected on srv 39
[10:39:19] world_log_data: user (test01)(testDW)@Arca is already connected on srv 39
[10:39:23] world_log_event: kick (test01)(testDW)@Arca from srv 192.168.1.100(39)
I need to count the number of occurrences of a string using bash and execute a command once the string repeats itself more than 5 times.
- the string to search for in the log file is
is already connected on srv
- the number that follows the string (eg,
21
) can change after the process is restarted, so the complete string will be something like thisis already connected on srv #
where#
is theID number
of a PC and can randomly be anywhere from1 to 100
- a given number cannot have more than one IP address assigned (eg,
21
can only haveIP_A(21)
and it cannot haveIP_A(21)
andIP_B(21)
- the IP address can change after process restart, so the log file may register a different IP address for number
21
next time it restarts (eg,192.168.1.6(21)
can become192.168.0.72(21)
- the code
awk
orgrep
I guess would the best should find the lines containing thestring + number
and count them; if more than 5 occurrences are the same it should execute the command - if multiple entries show up 5+ times the code should flag all of them, however, this shouldn't be necessary as I plan to truncate the
.log
file right after detection - the point of interest is parsing the IP address after the script detects multiple occurences of the mentioned string and then
telnet
into that IP address. - both the
IP address
and theID number
associated with it will change so two variables will be needed, the only static string the script should look for isis already connected on srv
and followed by theID number
Some examples of how the script should behave:
if string "is already connected on srv 21" count is =>5 times then "exec command telnet 192.168.1.6"
if string "is already connected on srv 7" count is =>5 times then "exec command telnet 192.168.1.39"
Here is an attempt using awk
but the IP address is not correctly displayed under the regular format containing dots such as 127.0.0.1. Probably grep
would be a better choice.
#!/bin/bash
search_string='is already connected on srv'
threshold=5
awk -v ss="${search_string}" -v threshold="${threshold}" '
$0 ~ ss { counter[$NF]++ } # counter[##]++
/ IPaddr_/ { for (i=2; i<=NF; i++) # loop through fields ...
if ($(i) ~ "IPaddr_") { # looking for string "IPaddr_"
split($(i),arr,"[()]") # split "IPaddr_?(##)" on parents
ip[arr[2]]=arr[1] # ip[##]=IPaddr_?
next} # skip to next input line
}
END { for (i in counter) # for every "##" encountered ...
if (counter[i] >= threshold) # if the count is greater than threshold then ...
print i,ip[i] # print "## IPaddr_?"
}
' world.log
Solution
Assumptions:
- we only have to deal with IPv4 formats
- the only strings like
<space><IPv4-address>(
in the file are the ones we're interested in
One idea is to modify the current awk
code to look for <space><IPv4-address>(
:
awk -v ss="${search_string}" -v threshold="${threshold}" '
$0 ~ ss { counter[$NF]++ }
/ [0-9]+.[0-9]+.[0-9]+.[0-9]+[(]/ {
for (i=2; i<=NF; i++)
if ( $(i) ~ "[0-9]+.[0-9]+.[0-9]+.[0-9]+[(]" ) {
split($(i),arr,"[()]")
ip[arr[2]]=arr[1]
next}
}
END { for (i in counter)
if (counter[i] >= threshold)
print i,ip[i]
}
' world.log
Run against the sample input with threshold=5
generates:
21 192.168.1.6
With threshold=2
this generates:
7 192.168.1.39
21 192.168.1.6
39 192.168.1.100
At this point OP can manipulate the awk
output as appropriate (eg, load into array(s), load each row into while
loop variables, etc).
For while
loop processing it will be a bit cleaner if we move the awk
code into a function, eg:
parse_srv_ip ()
{
awk -v ss="${search_string}" -v threshold="${threshold}" '
... snip ...
' "$1" # name of log file passed as only arg to function
}
We can then process the awk
output via a while
loop like such:
logfile='world.log'
while read -r srv ip
do
echo "srv = $srv : ip = $ip"
done < <(parse_srv_ip "${logfile}")
This generates:
srv = 7 : ip = 192.168.1.39
srv = 21 : ip = 192.168.1.6
srv = 39 : ip = 192.168.1.100
NOTE: OP would obviously replace the echo "srv ..."
with the desired code (eg, "exec command telnet..."
, etc).
Answered By - markp-fuso