Issue
As the title says im using Capacity Providers for scaling my instances when a service updates its desired count. Im new with autoscaling things, so im probably missing something. The problem is once i update the desired count, the alarm of the autoscaling works properly and the instances scale out very well but meanwhile the service creates new tasks and it never changes from "PROVISIONING" to "RUNNING"
First ill show you how id configured the launch template, the autoscaling and capacity-provider (replacing some variables for its values for an easier understanding)
locals {
ami_id = jsondecode(data.aws_ssm_parameter.ecs_optimized_ami.value)["image_id"]
cluster_user_data = "${base64encode(<<EOF
#! /bin/bash
sudo apt-get update
sudo echo "ECS_CLUSTER=${aws_ecs_cluster.cluster.name}" >> /etc/ecs/ecs.config
EOF
)}"
}
resource "aws_launch_template" "ecs-public" {
name_prefix = "ecs-${var.cluster_name}-public"
image_id = local.ami_id
instance_type = "t3.small"
iam_instance_profile {
name = aws_iam_instance_profile.cluster.name
}
block_device_mappings {
device_name = "/dev/xvda"
ebs {
volume_size = var.cluster_instance_root_block_device_size
volume_type = var.cluster_instance_root_block_device_type
}
}
# TODO use Dynamic here
network_interfaces {
device_index = 0
security_groups = var.security_groups_external
delete_on_termination = true
subnet_id = var.public_subnet_ids[0]
}
network_interfaces {
device_index = 1
security_groups = var.security_groups_external
delete_on_termination = true
subnet_id = var.public_subnet_ids[1]
}
user_data = local.cluster_user_data
key_name = aws_key_pair.generated_key[0].key_name
lifecycle {
create_before_destroy = true
}
depends_on = [
null_resource.iam_wait
]
}
resource "aws_autoscaling_group" "cluster_public" {
name_prefix = "asg-public-${var.cluster_name}"
vpc_zone_identifier = var.public_subnet_ids
launch_template {
id = aws_launch_template.ecs-public.id
version = "$Latest"
}
min_size = 1
max_size = 5
desired_capacity = 1
protect_from_scale_in = false
tag {
key = "Name"
value = "worker-public-${var.cluster_name}"
propagate_at_launch = true
}
tag {
key = "ClusterName"
value = var.cluster_name
propagate_at_launch = true
}
tag {
key = "AmazonECSManaged"
value = true
propagate_at_launch = true
}
dynamic "tag" {
for_each = var.tags
content {
key = tag.key
value = tag.value
propagate_at_launch = true
}
}
lifecycle {
create_before_destroy = true
}
}
resource "aws_ecs_capacity_provider" "autoscaling_group_public" {
name = "cp-${var.cluster_name}-public"
auto_scaling_group_provider {
auto_scaling_group_arn = aws_autoscaling_group.cluster_public.arn
managed_termination_protection = "DISABLED"
managed_scaling {
status = "ENABLED"
target_capacity = 100
minimum_scaling_step_size = 1
maximum_scaling_step_size = 100
}
}
}
resource "aws_ecs_cluster_capacity_providers" "cluster_capacity_providers" {
cluster_name = aws_ecs_cluster.cluster.name
capacity_providers = [aws_ecs_capacity_provider.autoscaling_group_private[0].name, aws_ecs_capacity_provider.autoscaling_group_public[0].name]
}
That is how the autoscaling and capacity provider are configured on the cluster, and here is the module im using for the service & task
module "container_definition" {
source = "cloudposse/ecs-container-definition/aws"
version = "0.58.1"
container_name = local.container_name
container_image = "${module.global_settings.aws_account_id}.dkr.ecr.${module.global_settings.region}.amazonaws.com/${local.project_name}:Staging-latest"
container_memory = 512
container_memory_reservation = 256
container_cpu = 256
essential = true
readonly_root_filesystem = false
environment = local.task_environment_variables
port_mappings = local.port_mappings
log_configuration = local.container_log_configuration
}
module "ecs_alb_service_task" {
source = "cloudposse/ecs-alb-service-task/aws"
version = "0.66.2"
namespace = var.cluster_name
stage = "Staging"
name = local.project_name
attributes = []
container_definition_json = module.container_definition.sensitive_json_map_encoded_list
#Load Balancer
alb_security_group = var.security_group_id
ecs_load_balancers = local.ecs_load_balancer_config
#VPC
vpc_id = var.vpc_id
subnet_ids = var.subnet_ids
network_mode = "awsvpc"
#Capacity Provider Strategy
capacity_provider_strategies =
[
{
capacity_provider = var.capacity_provider_name
weight = 1
base = 0
}
]
desired_count = 2
launch_type = "EC2"
ignore_changes_desired_count = true
ecs_cluster_arn = var.cluster_arn
security_group_ids = [var.security_group_id]
ignore_changes_task_definition = true
health_check_grace_period_seconds = 200
deployment_minimum_healthy_percent = 100
deployment_maximum_percent = 200
deployment_controller_type = "ECS"
task_memory = 512
task_cpu = 256
force_new_deployment = true
ordered_placement_strategy =
[
{
type = "spread"
field = "attribute:ecs.availability-zone"
},{
type = "spread"
field = "instanceId"
}
]
label_order = local.label_order
labels_as_tags = local.labels_as_tags
propagate_tags = local.propagate_tags
tags = merge(var.tags, local.tags)
task_exec_role_arn = [module.task_excecution_role.task_excecution_role_arn]
task_role_arn = [module.task_excecution_role.task_excecution_role_arn]
}
EDIT: I found that the instances are not registering correctly in the cluster, this usually happens when user_data is bad configured, and instead the instance is registered in the default cluster, this is neither happening
- I tryed to change the desired count multiple times
- I tryed deleting all the service and recreating it
- I modified "ignore_changes_desired_count" to true
- I tryed setting in "capacity_provider_strategies" a base of 1 and weight of 3
- I changed the instance_type from t3.micro to t3.small
Solution
Finally after all the investigation i found out that Launch Templates doesnt like to you to put "subnet_id" when you are using the template for an autoscaling on EC2. Ill show you how i found the answer first: First you need to go to EC2, click Launch Templates and click on eddit the template you are using
Click the button "Provide guidance to help me set up a template that I can use with EC2 Auto Scaling" and it going to show you if you have mistakes.
In my case my mistake was that i filled with terraform the field "subnet_id" that is part of the advanced network configuration Finally the code ends this way:
resource "aws_launch_template" "ecs-public" {
name_prefix = "ecs-${var.cluster_name}-public"
image_id = local.ami_id
instance_type = "t3.small"
iam_instance_profile {
name = aws_iam_instance_profile.cluster.name
}
block_device_mappings {
device_name = "/dev/xvda"
ebs {
volume_size = var.cluster_instance_root_block_device_size
volume_type = var.cluster_instance_root_block_device_type
}
}
network_interfaces {
associate_public_ip_address = true
device_index = 0
security_groups = var.security_groups_external
delete_on_termination = true
}
user_data = local.cluster_user_data
key_name = aws_key_pair.generated_key[0].key_name
lifecycle {
create_before_destroy = true
}
depends_on = [
null_resource.iam_wait
]
}
Answered By - Math.Random Answer Checked By - Robin (WPSolving Admin)