Issue
The input file which is shown below is generated by performing results of 2 other files
i.e awk 'BEGIN{FS=OFS=","} FNR==NR{arr[$0];next} {print $1,$2,$3,$5,($4 in arr)?1:0}' $NGW_REG_RESP_FILE $NGW_REG_REQ_FILE >> $NGW_REG_FILE
$NGW_REG_FILE file contains below data based on that i have to create a new file
2020-12-21,18,1,1,1
2020-12-21,18,1,1,0
2020-12-21,18,1,2,1
2020-12-21,18,1,2,1
2020-12-21,18,2,1,1
2020-12-21,18,2,1,1
2020-12-21,18,2,1,0
2020-12-21,18,3,2,1
2020-12-21,18,3,2,1
2020-12-21,18,4,2,0
2020-12-21,18,4,2,1
2020-12-21,18,3,2,0
What this data indicates is:
Date,Hour,Quarter,ReqType,Success/failed
Reqtype there were 2 possibilities: 1-> incoming 2-> outgoing
last field: 1->success 0-> failed
Quarter -> 1,2,3,4
I want to read this file and generate a new file that contains data like below (MY OUTPUT FILE):
2020-12-21,18,1,1,1,1
2020-12-21,18,1,2,2,0
2020-12-21,18,2,1,2,1
.....
Explanation:
heading: date,hour,quarter,reqType,Success_count,Failure_count (for reference to understand o/p file)
Date H Q ReqID SuccessCnt Fail Count
2020-12-21,18,1,1 ,1 ,1
Explanation: in input file for quarter 1 both reqTypes(1&2) were present
there will be at max 2 entry in each quarter.
in quarter 1 for reqid 1 there were 2 requests, 1 got success and other got failed
so 1 as success cnt and 1 as failure cnt
2020-12-21,18,1,2,2,0
here quarter 1 ,for req ID 2 there were 2 requests both got success so success count be 2 and failure count be 0
**UPDATE
The answer which is given in the comment is worked exactly what I was looking for.
I have some updates in the sample input file, i.e one more columns gets added before the last column i.e the STATUS CODE
which you can see in the below input i,e 200,400,300
2020-12-21,18,1,1,200,1
2020-12-21,18,2,1,400,0
2020-12-21,18,2,1,300,0
The existing code gives the below result in the output file: i.e Total count of success/failed in that quarter. Which is Correct.
What I want to do is add one more column to the output file, next to the total failed count i.e the array holding those status codes.
2020-12-21,18,1,1,1,0,[] //empty array in end bcs there is no failed req,1,success req
2020-12-21,18,2,1,0,2,[400,300] // here 2 failed req,0 success request
<DATE>,<HOUR>,<QUARTER>,<REQ_TYPE>,<SUCCESS_COUNT>,<FAIL_CNT>,<ARRAY_HOLDING_STATUSCODE>
I have added below changes to the code , Bu not getting how to iterate in side the same for loop
`cat $input_file | grep -v Orig | awk -F, '{
if ($NF==1) {
map[$1][$2][$3][$4]["success"]++
}
else {
map[$1][$2][$3][$4]["fail"]++
harish[$1][$2][$3][$4][$5]++ //ADDED THIS
}
}
END {
PROCINFO["sorted_in"]="@ind_num_asc";
for (i in map) {
for (j in map[i]) {
for (k in map[i][j]) {
for (l in map[i][j][k]) {
print i","j","k","l","(map[i][j][k][l]["success"]==""?"0":map[i][j][k][l]["success"])","(map[i][j][k][l]["fail"]==""?"0":map[i][j][k][l]["fail"])
}
}
}
}
}' >> OUTPUT_FILE.txt`
Solution
With awk (GNU awk for array sorting):
awk -F, '{ if ($NF==1) { map[$1][$2][$3][$4]["success"]++ } else { map[$1][$2][$3][$4]["fail"]++ } } END { PROCINFO["sorted_in"]="@ind_num_asc";for (i in map) { for (j in map[i]) { for (k in map[i][j]) { for (l in map[i][j][k]) { print i","j","k","l","(map[i][j][k][l]["success"]==""?"0":map[i][j][k][l]["success"])","(map[i][j][k][l]["fail"]==""?"0":map[i][j][k][l]["fail"]) } } } } }' $NGW_REG_FILE
Explanation:
awk -F, '{
if ($NF==1) {
map[$1][$2][$3][$4]["success"]++ # If last field is 1, increment a success index in array map with other fields as further indexes
}
else {
map[$1][$2][$3][$4]["fail"]++ # Otherwise increment a fail index
}
}
END {
PROCINFO["sorted_in"]="@ind_num_asc"; # Set the array ordering
for (i in map) {
for (j in map[i]) {
for (k in map[i][j]) {
for (l in map[i][j][k]) {
print i","j","k","l","(map[i][j][k][l]["success"]==""?"0":map[i][j][k][l]["success"])","(map[i][j][k][l]["fail"]==""?"0":map[i][j][k][l]["fail"]) # Loop through the array and print the data in the format required. If there is no entry in the success or fail index, print 0.
}
}
}
}
}' $NGW_REG_FILE
Answered By - Raman Sailopal