Issue
I'd like to configure a cloudwatch alarm that can turn adjust the desired capacity of an autoscaling group down when one of the instances in the autoscaling group is detected to be "idle". Idleness could be defined, for example, by having CPU Utilization below 1% for a sustained period of time.
I'm using terraform, so right now I have the alarm congfigured as follows.
resource "aws_cloudwatch_metric_alarm" "backend-x86_64" {
alarm_name = "backend-x86_64-is-idle"
comparison_operator = "LessThanOrEqualToThreshold"
evaluation_periods = "4"
period = "120"
metric_name = "CPUUtilization"
namespace = "AWS/EC2"
statistic = "Average"
threshold = "1"
treat_missing_data = "notBreaching"
dimensions = {
AutoScalingGroupName = aws_autoscaling_group.backend-x86_64.name
}
alarm_description = "This metric monitors ec2 cpu utilization"
alarm_actions = [aws_autoscaling_policy.backend-x86_64-shutdown.arn]
}
However, it's not working. Below is a picture of the graph of CPU Utilization over time -- it's at 0% for 2 hours, and yet it doesn't fire. I suspect this is because I'm misunderstanding how "evaluation_periods" and "period" interact.
Can anyone explain how to configure terraform /cloudwatch to do what I want here?
Solution
The gaps indicate that your instance only has basic monitoring where data is available in 5 minute periods.
This, with your current monitoring policy, which checks for CPU utilization less than 1% for 4 data points in 8 minutes, meaning one data point every 2 minutes plus with missing data treated as good means that your alert will never get triggered.
Enabling Detailed Monitoring gets you data in 1 min intervals and you should be able to get this alarm right.
Answered By - Sathyajith Bhat