Issue
I have a data file in 10 minute interval with some missing data.
cat << EOF > ifile.txt
2001/1/1 0:00 3
2001/1/1 0:10 32
2001/1/1 0:40 17
2001/1/1 0:50 4.402
2001/1/1 1:00 3.95
2001/1/1 1:50 1.42
2001/1/1 2:00 0.34
2001/1/1 2:10 0
2001/1/1 2:20 9
2001/1/1 2:30 57
EOF
If you check the above file, there are some missing data, e.g. at 0:20, 0:30 etc.
I would like to check the data with date and time from 2001/1/1 0:00 until 2001/1/1 3:00 and fill the missing data as "nan".
My script is as below.
awk 'BEGIN {
FS=OFS=" ";
start_time = mktime("200101010000");
end_time = mktime("200101010300");
current_minutes = 0;
}
{
entry_time = mktime(gensub(/[:\/]/, " ", "g", $1" "$2));
while (current_minutes < (entry_time - start_time) / 60) {
printf "%s %02d:%02d nan\n", strftime("%Y/%m/%d", start_time), int(current_minutes / 60), current_minutes % 60;
current_minutes += 10;
}
print $0;
current_minutes += 10;
}
END {
while (current_minutes <= (end_time - start_time) / 60) {
printf "%s %02d:%02d nan\n", strftime("%Y/%m/%d", start_time), int(current_minutes / 60), current_minutes % 60;
current_minutes += 10;
}
}' ifile.txt
But it is not printing the missing data. It is just printing my input file.
Desired output:
# ofile.txt content
2001/1/1 0:00 3
2001/1/1 0:10 32
2001/1/1 0:20 nan
2001/1/1 0:30 nan
2001/1/1 0:40 17
2001/1/1 0:50 4.402
2001/1/1 1:00 3.95
2001/1/1 1:10 nan
2001/1/1 1:20 nan
2001/1/1 1:30 nan
2001/1/1 1:40 nan
2001/1/1 1:50 1.42
2001/1/1 2:00 0.34
2001/1/1 2:10 0
2001/1/1 2:20 9
2001/1/1 2:30 57
2001/1/1 2:40 nan
2001/1/1 2:50 nan
2001/1/1 3:00 nan
Solution
You can use the below GNU awk
(I am using gawk4.2.1
and gawk5.3.0
) to achieve your result:
BEGIN {
start_time = mktime("2001 01 01 0 0 0")
end_time = mktime("2001 01 01 3 0 0")
# interval 600s
interval = 10 * 60
FS = OFS = " "
}
{
# combines the first two fields (date and time) of the input line.
current_time = $1 " " $2
# timestamp for the current_time in seconds since the epoch.
current_timestamp = mktime(gensub("/", " ", "g", $1) " " gensub(":", " ", "g", $2) " 0")
# prints the missing timestamps with "nan" until the start_time >= current_timestamp.
while (start_time < current_timestamp) {
# formatting matters here. No zeros would be appended at the start suing this.
print strftime("%Y/%-m/%-d %-H:%M", start_time), "nan"
start_time += interval
}
print $0
start_time += interval
}
# This block is executed after processing the input file.
# It prints any remaining missing timestamps until the end_time is reached.
END {
while (start_time <= end_time) {
print strftime("%Y/%-m/%-d %-H:%M", start_time), "nan"
start_time += interval
}
}
Answered By - mandy8055 Answer Checked By - David Goodson (WPSolving Volunteer)