Wednesday, February 2, 2022

[SOLVED] Count occurrences of strings using bash

Issue

I need to count the number of occurrences of a string inside of a log file using bash and execute a command once the string repeats itself more than 5 times.

I have the following sample data from the log file:

[10:35:56] world_log_event: kick (starrr)(NormieBL)@Arca from srv 192.168.1.6(21)  
[10:39:17] world_log_data: user (chrisxJ02)(Delaon)@Arca is already connected on srv 7
[10:39:23] world_log_event: kick (chrisxJ02)(Delaon)@Arca from srv 192.168.1.39(7)
[10:39:17] world_log_data: user (test01)(testDW)@Arca is already connected on srv 39

Some examples of how the script should behave:

if string "is already connected on srv 21" count is =>5 times then "exec command telnet 192.168.1.6"
if string "is already connected on srv 7" count is =>5 times then "exec command telnet 192.168.1.39"

Solution

Assumptions and collection of OP's comments:

  • user provides a string to search on (eg, Lorem ipsum dolor sit amet)
  • of interest is the field that follows this string (eg, 21), aka ##; sample data shows this will always be a single field
  • keep count of the number of times each of these ## shows up in the input
  • user provides a threshold on the number of matches to report on (eg, OP has mentioned 5 in the question), aka threshold
  • the only occurences of the string IPaddr_ are in the lines of interest ...
  • keep track of IPaddr_? fields and the associated ## (eg, IPaddr_?(##))
  • there will always be at least one IPAddr_?(##) in the input
  • per OP's (updated) sample data the IPaddr_?(##) entry is always the last field in a (white) space delimited input line; for sake of completeness we'll assume there could be other (white) spaced delimited fields after the IPaddr_?(##) we're interested in
  • if ## has multiple matching IPaddr_? records [NOTE: OP has stated this scenario does not occur], the proposed awk solution (below) will report the last IPaddr_? read from the input
  • at the end of processing if a ## shows up at least threshold times then print the ## and the associated IPaddr_?; OP hasn't provided a desired output format so for now we'll assume ## IPaddr_? is sufficient for the calling process to parse

Input parameters set by user:

search_string='Lorem ipsum dolor sit amet'
threshold=5

One awk idea:

awk -v ss="${search_string}" -v threshold="${threshold}" '

$0 ~ ss    { counter[$NF]++ }                    # counter[##]++

/ IPaddr_/ { for (i=2; i<=NF; i++)               # loop through fields ...
                 if ($(i) ~ "IPaddr_") {         # looking for string "IPaddr_"
                    split($(i),arr,"[()]")       # split "IPaddr_?(##)" on parens
                    ip[arr[2]]=arr[1]            # ip[##]=IPaddr_?
                    next}                        # skip to next input line
           }

END        { for (i in counter)                  # for every "##" encountered ...
                 if (counter[i] >= threshold)    # if the count is greater than threshold then ...
                    print i,ip[i]                # print "## IPaddr_?"
           }
' ipsum.log

Using OP's sample input this generates:

21 IPaddr_B

For threshold=3 this generates:

21 IPaddr_B

For threshold=2 this generates:

17 IPaddr_A
21 IPaddr_B
22 IPaddr_C


Answered By - markp-fuso
Answer Checked By - Willingham (WPSolving Volunteer)