Issue
To start, I have done a lot of googling and tried grep, sed and awk, but was not able to find a solution with either.
My goal is to find 2 patterns in a log file, but only if each first pattern has a matching second pattern (with no matching first patterns between them). After that I will compare the timestamps on both to calculate the time between, but that part is not where I am stuck.
Almost every solution I found via google left me with all of the lines between the start and end (which I do not need) or they left me with the first start match and the first end match (with multiple starts in between)
Example of the kind of text I am working with:
2022-09-10 20:17:05.552 [INFO] Starting process
2022-09-10 20:17:05.554 [INFO] junk here
2022-09-10 20:24:02.664 [INFO] junk here
2022-09-10 20:24:02.666 [INFO] Starting process
2022-09-10 20:30:57.526 [INFO] Starting process
2022-09-10 20:30:57.529 [INFO] Ending process
2022-09-10 20:37:55.122 [INFO] Starting process
2022-09-10 20:37:55.126 [INFO] Ending process
2022-09-10 20:44:50.352 [INFO] junk here
I want to find the lines with "Starting process" and then "Ending process" but with no "Starting process" between them (multiple starts without an end are failed attempts and I only need the ones that completed). The example has multiple failed starts, but only 2 starts that completed: Lines 5-6 and 7-8
Expected output:
2022-09-10 20:30:57.526 [INFO] Starting process
2022-09-10 20:30:57.529 [INFO] Ending process
2022-09-10 20:37:55.122 [INFO] Starting process
2022-09-10 20:37:55.126 [INFO] Ending process
Actually, the only output I really need would be:
2022-09-10 20:30:57.526
2022-09-10 20:30:57.529
2022-09-10 20:37:55.122
2022-09-10 20:37:55.126
(because my only need for these lines is to get the start and end time to calculate average time for this task when it completes)
I am willing to use most command line methods available via bash on Ubuntu (this is for a windows machine with WSL), so sed/awk/grep and possibly even perl are fine.
Solution
Here's a solution with awk
for getting the matching dates:
awk -F ' \\[[^[]*] ' '
$2 == "Starting process" { d = $1 }
$2 == "Ending process" && d != "" { print d, $1 ; d = "" }
'
2022-09-10 20:30:57.526 2022-09-10 20:30:57.529
2022-09-10 20:37:55.122 2022-09-10 20:37:55.126
If you're using GNU awk then you can even calculate the time difference:
awk -F ' \\[[^[]*] ' '
function date2time(d, _d) {
_d = d
gsub( /[T:-]/, " ", _d )
return mktime(_d) substr(d, index(d,"."))
}
$2 == "Starting process" {
t = date2time($1)
}
$2 == "Ending process" && t != "" {
printf "%.03f\n", date2time($1) - t
t = ""
}
'
0.003
0.004
Answered By - Fravadona Answer Checked By - Marilyn (WPSolving Volunteer)