Issue
I have a logfile in a Linux OS (redhat) which inserts events of a database. The file looks like this:
2021-08-04 09:35:00.212 +03 [100] FATAL: password fail for x
2021-08-04 09:35:20.276 +03 [101] FATAL: password fail for x
2021-08-04 09:36:05.223 +03 [104] FATAL: password fail for x
2021-08-04 09:36:20.823 +03 [305] FATAL: password fail for y
2021-08-04 09:37:00.299 +03 [322] FATAL: password fail for y
2021-08-04 09:37:50.350 +03 [328] FATAL: password fail for y
2021-08-04 09:38:20.822 +03 [340] FATAL: password fail for z
2021-08-04 09:38:22.500 +03 [370] FATAL: password fail for z
2021-08-04 09:38:50.210 +03 [420] FATAL: password fail for z
2021-08-04 09:39:01.372 +03 [423] FATAL: password fail for z
I want to get only lines with the latest datetime for each user(x,y,z). So it should look like below:
2021-08-04 09:36:05.223 +03 [104] FATAL: password fail for x
2021-08-04 09:37:50.350 +03 [328] FATAL: password fail for y
2021-08-04 09:39:01.372 +03 [423] FATAL: password fail for z
Solution
We can use awk to get lines that have unique value on the latest column.
print unique lines based on field
To ensure those are the latest (datatime), I'd assume the following
- The file is always sorted from old to new
Therefore, if we;
- Reverse the file (to go from
new -> old
) - Get the unique user rows
- Reverse it again (to go from
old -> new
)
Will get the last failed attempts for each user:
tac log.txt | awk -F" " '!_[$9]++' | tac
Example on my local machine:
$
$ cat log.txt
2021-08-04 09:35:00.212 +03 [100] FATAL: password fail for x
2021-08-04 09:35:20.276 +03 [101] FATAL: password fail for x
2021-08-04 09:36:05.223 +03 [104] FATAL: password fail for x
2021-08-04 09:36:20.823 +03 [305] FATAL: password fail for y
2021-08-04 09:37:00.299 +03 [322] FATAL: password fail for y
2021-08-04 09:37:50.350 +03 [328] FATAL: password fail for y
2021-08-04 09:38:20.822 +03 [340] FATAL: password fail for z
2021-08-04 09:38:22.500 +03 [370] FATAL: password fail for z
2021-08-04 09:38:50.210 +03 [420] FATAL: password fail for z
2021-08-04 09:39:01.372 +03 [423] FATAL: password fail for z
$
$ tac log.txt | awk -F" " '!_[$9]++' | tac
2021-08-04 09:36:05.223 +03 [104] FATAL: password fail for x
2021-08-04 09:37:50.350 +03 [328] FATAL: password fail for y
2021-08-04 09:39:01.372 +03 [423] FATAL: password fail for z
$
Answered By - 0stone0 Answer Checked By - Mary Flores (WPSolving Volunteer)