Issue
I tried autoscaling groups and alternatively just a bunch of EC2 instances tied by load balancer. Both configs are working fine at first glance.
But, when the EC2 is a part of autoscaling group it goes down sometimes. Actually it happens very often, almost once a day. And they go down in a "hard reset" way. The ec2 monitoring graphs show that CPU usage goes up to 100%, then the instance become not responsive and then it is terminated by autoscaling group.
And it has nothing to do with my processes on these instances.
When the instance is not a part of Autoscaling groups, it can work without the CPU usage spikes for years.
The "hard reset" on autoscaling group instances are braking my cron jobs. As much as I like the autoscaling groups I cannot use it.
It there a standard way to deal with the "hard resets"?
PS.
The cron jobs are running PHP scripts on Ubuntu in my case. I managed to make only one instance running the job.
Solution
It sounds like you have a health check that is failing when your cron is running, as as a result the instance is being taken out of service.
If you look at the ASG, there should be a reason listed for why the instance was taken out. This will usually be a health check failure, but there could be other reasons as well.
There are a couple things you can do to fix this.
First, determine why your cron is taking 100% of CPU, and how long it generally takes.
Review your health check settings. Are you using HTTP or TCP? What is the interval, and how many checks have to fail before it is taken out of service?
Between those two items, you should be able to adjust the health checks so that it doesn't take it out of service during the cron running time. It is possible that the instance is failing, typically this would be because it runs out of memory. If that is the case, you may want to consider going to a large instance type and/or enabling swap.
Answered By - chris Answer Checked By - David Marino (WPSolving Volunteer)