Wednesday, April 27, 2022

[SOLVED] Grep sorting dates

Issue

Each day I have to manually identify each stuck call by looking for calls with a date older than the current day. I have managed to grep the required fields to identify the calls in question.

grep -e start -e instance stuckcdr.txt


CDR instance 153 [] > :
                start                    [12:04:2022][10:07:09:968]
CDR instance 200 [] > :
                start                    [12:04:2022][10:05:56:991]
CDR instance 209 [] > :
                start                    [12:04:2022][09:55:55:358]
CDR instance 216 [] > :
                start                    [12:04:2022][10:05:40:443]
CDR instance 218 [] > :
                start                    [12:04:2022][10:07:44:084]
CDR instance 221 [] > :
                start                    [12:04:2022][10:08:11:690]
CDR instance 222 [] > :
                start                    [12:04:2022][09:52:47:846]
CDR instance 223 [] > :
                start                    [07:04:2022][12:28:03:858]
CDR instance 225 [] > :
                start                    [12:04:2022][10:02:40:345]
CDR instance 226 [] > :
                start                    [12:04:2022][10:07:58:530]
CDR instance 227 [] > :
                start                    [03:04:2022][17:53:16:771]
CDR instance 231 [] > :
                start                    [12:04:2022][10:06:19:830]
CDR instance 234 [] > :
                start                    [12:04:2022][10:06:06:937]
CDR instance 237 [] > :
                start                    [04:04:2022][08:55:03:575]
CDR instance 238 [] > :
                start                    [07:04:2022][12:28:15:537]
CDR instance 242 [] > :
                start                    [12:04:2022][10:05:18:753]
CDR instance 243 [] > :
                start                    [07:04:2022][12:23:38:303]
CDR instance 244 [] > :
                start                    [12:04:2022][10:01:40:195]
CDR instance 245 [] > :
                start                    [12:04:2022][10:08:33:821]
CDR instance 246 [] > :
                start                    [12:04:2022][09:53:03:281]
CDR instance 247 [] > :
                start                    [12:04:2022][09:42:06:561]
CDR instance 248 [] > :
                start                    [12:04:2022][10:04:49:953]
CDR instance 249 [] > :
                start                    [12:04:2022][10:07:29:250]
CDR instance 250 [] > :
                start                    [12:04:2022][10:01:33:905]
CDR instance 253 [] > :
                start                    [12:04:2022][09:55:48:996]
CDR instance 254 [] > :
                start                    [07:04:2022][12:27:55:402]
CDR instance 255 [] > :
                start                    [12:04:2022][10:04:38:088]
CDR instance 256 [] > :
                start                    [12:04:2022][09:42:47:932]
CDR instance 258 [] > :
                start                    [12:04:2022][09:57:16:372]
CDR instance 259 [] > :
                start                    [12:04:2022][09:46:35:323]
CDR instance 260 [] > :
                start                    [12:04:2022][10:05:19:144]
CDR instance 262 [] > :
                start                    [12:04:2022][09:52:56:531]
CDR instance 263 [] > :
                start                    [12:04:2022][10:07:50:331]

Can I also filter the data by start date older than the current date?

It would massively speed up the process :)

Thanks


Solution

Using awk:

Edit: This only works if the date units are ordered largest to smallest (eg. %Y:%m:%d). That's not the case with OP's data. See the comments and revised solution below.

LC_ALL=C awk -v date="$(date '+[%d:%m:%Y][00:00:00:000]')" '
$2=="instance"{instance=$0}
$1=="start" && $2<date {$1=$1; print instance,$0}' stuckcdr.txt

This prints entries from before the start of the current day.

We use lexical sorting to compare date strings. This is possible due to each field (month, hour, etc) being the same length. LC_ALL=C guarantees consistent behaviour in different locales.

To print entries older than 'now' (when the script is run, to the nearest second, as opposed to older than 'today'), use this date syntax: date="$(date '+[%d:%m:%Y][%H:%M:%S:000]')".

$1=$1 just trims the whitespace. I also print instance number and date on the same line, for readability.

Example output:

CDR instance 223 [] > : start [07:04:2022][12:28:03:858]
CDR instance 227 [] > : start [03:04:2022][17:53:16:771]
CDR instance 237 [] > : start [04:04:2022][08:55:03:575]
CDR instance 238 [] > : start [07:04:2022][12:28:15:537]
CDR instance 243 [] > : start [07:04:2022][12:23:38:303]
CDR instance 254 [] > : start [07:04:2022][12:27:55:402]

You can also sort output by piping to sort -k 8,8 (sort by date) or sort -n 3,3 (sort by instance number).

Edit:

As tshiono noted in a comment, the date is in the wrong format to be compared directly. This revised solution splits and rearranges the date for a correct comparison.

LC_ALL=C awk -v date="$(date '+[%Y:%m:%d][00:00:00:000]')" '
$2=="instance"{instance=$0}
$1=="start" && split($2, a, /[]:[]+/) &&
date>"["a[4]":"a[3]":"a[2]"]["a[5]":"a[6]":"a[7]"]" {
    $1=$1
    print instance,$0
}
' stuckcdr.txt


Answered By - dan
Answer Checked By - Dawn Plyler (WPSolving Volunteer)