Issue
I'm having s3 endpoint grief. When my instances initialize they can not install docker. Details:
I have ASG instances sitting in a VPC with pub and private subnets. Appropriate routing and EIP/NAT is all stitched up.Instances in private subnets have outbouond 0.0.0.0/0 routed to NAT in respective public subnets. NACLs for public subnet allow internet traffic in and out, the NACLs around private subnets allow traffic from public subnets in and out, traffic out to the internet (and traffic from s3 cidrs in and out). I want it pretty locked down.
- I have DNS and hostnames enabled in my VPC
- I understand NACLs are stateless and have enabled IN and OUTBOUND rules for s3 amazon IP cidr blocks on ephemeral port ranges (yes I have also enabled traffic between pub and private subnets)
- yes I have checked a route was provisioned for my s3 endpoint in my private route tables
- yes I know for sure it is the s3 endpoint causing me grief and not another blunder -> when I delete it and open up my NACLs I can yum update and install docker (as expected) I am not looking for suggestions that require opening up my NACLs, I'm using a VPC gateway endpiont because I want to keep things locked down in the private subnets. I mention this because similar discussions seem to say 'I opened 0.0.0.0/0 on all ports and now x works'
- Should I just bake an AMI with docker installed? That's what I'll do if I can't resolve this. I really wanted to set up my networking so everything is nicely locked down and feel like it should be pretty straight forward utilizing endpoints. Largely this is a networking exercise so I would rather not do this because it avoids solving and understanding the problem.
- I know my other VPC endpoints work perfectly -> Auto-scaling service interface endpoint is performing (I can see it scaling down instances as per the policy), SSM interface endpoint allowing me to use session manager, and ECR endpoint(s) are working in conjunction with s3 gateway endpoint (s3 gateway endpoint is required because image layers are in s3) -> I know this works because if I open up NACLS and delete my s3 endpoint and install docker, then lock everything down again, bring back my s3 gatewayendpoint I can successfully pull my ECR images. SO the s3 gateway endpoint is fine for accessing ecr image layers, but not amazon-linux-extra repos.
- SGs attached to instances are not the problem (instances have default outbound rule)
- I have tried adding increasingly generous policies to my s3 endpoint as I have seen in this 7 year old thread and thought this had to do the trick (yes I subbed my region correctly)
- I strongly feel the solution lies with the s3 gateway policy as discussed in this thread, however have had little luck with my increasingly desperate policies.
Amazon EC2 instance can't update or use yum
another s3 struggle with resolution:
I have tried:
S3Endpoint:
Type: 'AWS::EC2::VPCEndpoint'
Properties:
PolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Principal: '*'
Action:
- 's3:GetObject'
Resource:
- 'arn:aws:s3:::prod-ap-southeast-2-starport-layer-bucket/*'
- 'arn:aws:s3:::packages.*.amazonaws.com/*'
- 'arn:aws:s3:::repo.*.amazonaws.com/*'
- 'arn:aws:s3:::amazonlinux-2-repos-ap-southeast-2.s3.ap-southeast-2.amazonaws.com/*'
- 'arn:aws:s3:::amazonlinux.*.amazonaws.com/*'
- 'arn:aws:s3:::*.amazonaws.com'
- 'arn:aws:s3:::*.amazonaws.com/*'
- 'arn:aws:s3:::*.ap-southeast-2.amazonaws.com/*'
- 'arn:aws:s3:::*.ap-southeast-2.amazonaws.com/'
- 'arn:aws:s3:::*repos.ap-southeast-2-.amazonaws.com'
- 'arn:aws:s3:::*repos.ap-southeast-2.amazonaws.com/*'
- 'arn:aws:s3:::repo.ap-southeast-2-.amazonaws.com'
- 'arn:aws:s3:::repo.ap-southeast-2.amazonaws.com/*'
RouteTableIds:
- !Ref PrivateRouteTableA
- !Ref PrivateRouteTableB
ServiceName: !Sub 'com.amazonaws.${AWS::Region}.s3'
VpcId: !Ref BasicVpc
VpcEndpointType: Gateway
(as you can see, very desperate) The first rule is required for the ECR interface endpoints to pull the image layers from s3, all of the others are attempts to reach amazon-linux-extras repos.
Below is the behavior happening on initialization I have recreated by connecting with session manager using SSM endpoint:
https://aws.amazon.com/premiumsupport/knowledge-center/connect-s3-vpc-endpoint/
I can not yum install or update
root@ip-10-0-3-120 bin]# yum install docker -y
Loaded plugins: extras_suggestions, langpacks, priorities, update-motd Could not retrieve mirrorlist https://amazonlinux-2-repos-ap-southeast-2.s3.ap-southeast-2.amazonaws.com/2/core/latest/x86_64/mirror.list error was 14: HTTPS Error 403 - Forbidden
One of the configured repositories failed (Unknown), and yum doesn't have enough cached data to continue. At this point the only safe thing yum can do is fail. There are a few ways to work "fix" this:
1. Contact the upstream for the repository and get them to fix the problem.
2. Reconfigure the baseurl/etc. for the repository, to point to a working
upstream. This is most often useful if you are using a newer
distribution release than is supported by the repository (and the
packages for the previous distribution release still work).
3. Run the command with the repository temporarily disabled
yum --disablerepo=<repoid> ...
4. Disable the repository permanently, so yum won't use it by default. Yum
will then just ignore the repository until you permanently enable it
again or use --enablerepo for temporary usage:
yum-config-manager --disable <repoid>
or
subscription-manager repos --disable=<repoid>
5. Configure the failing repository to be skipped, if it is unavailable.
Note that yum will try to contact the repo. when it runs most commands,
so will have to try and fail each time (and thus. yum will be be much
slower). If it is a very temporary problem though, this is often a nice
compromise:
yum-config-manager --save --setopt=<repoid>.skip_if_unavailable=true
Cannot find a valid baseurl for repo: amzn2-core/2/x86_64
and can not:
amazon-linux-extras install docker
Catalog is not reachable. Try again later.
catalogs at https://amazonlinux-2-repos-ap-southeast-2.s3.ap-southeast-2.amazonaws.com/2/extras-catalog-x86_64-v2.json, https://amazonlinux-2-repos-ap-southeast-2.s3.ap-southeast-2.amazonaws.com/2/extras-catalog-x86_64.json Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/amazon_linux_extras/software_catalog.py", line 131, in fetch_new_catalog request = urlopen(url) File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen return opener.open(url, data, timeout) File "/usr/lib64/python2.7/urllib2.py", line 435, in open response = meth(req, response) File "/usr/lib64/python2.7/urllib2.py", line 548, in http_response 'http', request, response, code, msg, hdrs) File "/usr/lib64/python2.7/urllib2.py", line 473, in error return self._call_chain(*args) File "/usr/lib64/python2.7/urllib2.py", line 407, in _call_chain result = func(*args) File "/usr/lib64/python2.7/urllib2.py", line 556, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) HTTPError: HTTP Error 403: Forbidden
Any gotchas I've missed? I'm very stuck here. I am familiar with basic VPC networking, NACLs and VPC endpoints (the ones I've used at least), I have followed the trouble-shooting (although I already had everything set-up as outlined).
I feel the s3 policy is the problem here OR the mirror list. Many thanks if you bothered to read all that! Thoughts?
Solution
By the looks of it, you are well aware of what you are trying to achieve. Even though you are saying that it is not the NACLs, I would check them one more time, as sometimes one can easily overlook something minor. Take into account the snippet below taken from this AWS troubleshooting article and make sure that you have the right S3 CIDRs in your rules for the respective region:
Make sure that the network ACLs associated with your EC2 instance's subnet allow the following: Egress on port 80 (HTTP) and 443 (HTTPS) to the Regional S3 service. Ingress on ephemeral TCP ports from the Regional S3 service. Ephemeral ports are 1024-65535. The Regional S3 service is the CIDR for the subnet containing your S3 interface endpoint. Or, if you're using an S3 gateway, the Regional S3 service is the public IP CIDR for the S3 service. Network ACLs don't support prefix lists. To add the S3 CIDR to your network ACL, use 0.0.0.0/0 as the S3 CIDR. You can also add the actual S3 CIDRs into the ACL. However, keep in mind that the S3 CIDRs can change at any time.
Your S3 endpoint policy looks good to me on first look, but you are right that it is very likely that the policy or the endpoint configuration in general could be the cause, so I would re-check it one more time too.
One additional thing that I have observed before is that depending on the AMI you use and your VPC settings (DHCP options set, DNS, etc) sometimes the EC2 instance cannot properly set it's default region in the yum config. Please check whether the files awsregion
and awsdomain
exist within the /etc/yum/vars
directory and what's their content. In your use case, the awsregion should have:
$ cat /etc/yum/vars/awsregion
ap-southeast-2
You can check whether the DNS resolving on your instance is working properly with:
dig amazonlinux.ap-southeast-2.amazonaws.com
If DNS seems to be working fine, you can compare whether the IP in the output resides within the ranges you have allowed in your NACLs.
EDIT:
After having a second look, this line, is a bit stricter than it should be:
arn:aws:s3:::amazonlinux-2-repos-ap-southeast-2.s3.ap-southeast-2.amazonaws.com/*
According to the docs it should be something like:
arn:aws:s3:::amazonlinux-2-repos-ap-southeast-2/*
Answered By - Nick Answer Checked By - Candace Johnson (WPSolving Volunteer)