Saturday, January 27, 2024

[SOLVED] Checking the date and time and filling the missing data

Issue

I have a data file in 10 minute interval with some missing data.

cat << EOF > ifile.txt
2001/1/1 0:00 3
2001/1/1 0:10 32
2001/1/1 0:40 17
2001/1/1 0:50 4.402
2001/1/1 1:00 3.95
2001/1/1 1:50 1.42
2001/1/1 2:00 0.34
2001/1/1 2:10 0
2001/1/1 2:20 9
2001/1/1 2:30 57
EOF

If you check the above file, there are some missing data, e.g. at 0:20, 0:30 etc.

I would like to check the data with date and time from 2001/1/1 0:00 until 2001/1/1 3:00 and fill the missing data as "nan".

My script is as below.

awk 'BEGIN {
    FS=OFS=" ";
    start_time = mktime("200101010000");
      end_time = mktime("200101010300");
current_minutes = 0;
}
{
    entry_time = mktime(gensub(/[:\/]/, " ", "g", $1" "$2));
    while (current_minutes < (entry_time - start_time) / 60) {
        printf "%s %02d:%02d nan\n", strftime("%Y/%m/%d", start_time), int(current_minutes / 60), current_minutes % 60;
        current_minutes += 10;
    }
    print $0;
    current_minutes += 10;
}
END {
    while (current_minutes <= (end_time - start_time) / 60) {
        printf "%s %02d:%02d nan\n", strftime("%Y/%m/%d", start_time), int(current_minutes / 60), current_minutes % 60;
        current_minutes += 10;
    }
}' ifile.txt

But it is not printing the missing data. It is just printing my input file.

Desired output:

# ofile.txt content
2001/1/1 0:00 3
2001/1/1 0:10 32
2001/1/1 0:20 nan
2001/1/1 0:30 nan
2001/1/1 0:40 17
2001/1/1 0:50 4.402
2001/1/1 1:00 3.95
2001/1/1 1:10 nan
2001/1/1 1:20 nan
2001/1/1 1:30 nan
2001/1/1 1:40 nan
2001/1/1 1:50 1.42
2001/1/1 2:00 0.34
2001/1/1 2:10 0
2001/1/1 2:20 9
2001/1/1 2:30 57
2001/1/1 2:40 nan
2001/1/1 2:50 nan
2001/1/1 3:00 nan

Solution

You can use the below GNU awk(I am using gawk4.2.1 and gawk5.3.0) to achieve your result:

BEGIN {
    start_time = mktime("2001 01 01 0 0 0")
    end_time = mktime("2001 01 01 3 0 0")
    # interval 600s
    interval = 10 * 60
    FS = OFS = " "
}
{
   # combines the first two fields (date and time) of the input line.
    current_time = $1 " " $2
   # timestamp for the current_time in seconds since the epoch.
    current_timestamp = mktime(gensub("/", " ", "g", $1) " " gensub(":", " ", "g", $2) " 0")
   # prints the missing timestamps with "nan" until the start_time >= current_timestamp.
    while (start_time < current_timestamp) {
        # formatting matters here. No zeros would be appended at the start suing this.
        print strftime("%Y/%-m/%-d %-H:%M", start_time), "nan"
        start_time += interval
    }
    print $0
    start_time += interval
}
# This block is executed after processing the input file. 
# It prints any remaining missing timestamps until the end_time is reached.
END {
    while (start_time <= end_time) {
        print strftime("%Y/%-m/%-d %-H:%M", start_time), "nan"
        start_time += interval
    }
}

demo here



Answered By - mandy8055
Answer Checked By - David Goodson (WPSolving Volunteer)