Issue
Use-Case: We are deploying virtual machines into a cloud with a default linux image (Ubuntu 22.04 at the moment). After deploying a machine, we configure our default users and change the SSH port from 22 to 2222 with Ansible.
Side note: We are using a jump concept through the internet - Ansible automation platform / AWS => internet => SSH jump host => target host
To keep the possibility for Ansible to connect to the new machine, after changing the SSH port, I found multiple Stack Overflow / blog entries, checking and setting ansible_ssh_port
, basically by running wait_for
on port 22 and 2222 and set the SSH variable depending on the result (code below).
Right now this works fine for the first SSH host (jumphost), but always fails for the second host due to issues with establishing the ssh connection.
Side note: The SSH daemon is running. If I use my user from the jump host, I can get a SSH response from 22/2222 (depending on the current state of deployment).
Edit from questions:
The deployment tasks should only be run on the target host. Not the jumphost as well.
I run the deployment on the jumphost first and make sure it is up, running and configured.
After that, i run the deployment on all machines behind the jumphost to configure them.
This also ensures that if i ever would need reboot, that i don't kill all tunneled ssh session by accident.
Ansible inventory example
all:
hosts:
children:
jumphosts:
hosts:
example_jumphost:
ansible_host: 123.123.123.123
cloud_hosts:
hosts:
example_cloud_host01: #local DNS is resolved on the jumphost - no ansible_host here (yet)
ansible_ssh_common_args: '-oProxyCommand="ssh -W %h:%p -oStrictHostKeyChecking=no -q [email protected] -p 2222"' #Tunnel through the appropriate jumphost
delegation_host: "[email protected]" #delegate jobs to the jumphost in each project if needed
vars:
ansible_ssh_port: 2222
SSH check_port
role
- name: Set SSH port to 2222
set_fact:
ansible_ssh_port: 2222
- name: "Check backend port 2222"
wait_for:
port: 2222
state: "started"
host: "{{ inventory_hostname }}"
connect_timeout: "5"
timeout: "5"
# delegate_to: "{{ delegation_host }}"
# vars:
# ansible_ssh_port: 2222
ignore_errors: true
register: ssh_port
- name: "Check backend port 22"
wait_for:
port: "22"
state: "started"
host: "{{ inventory_hostname }}"
connect_timeout: "5"
timeout: "5"
# delegate_to: "{{ delegation_host }}"
# vars:
# ansible_ssh_port: 2222
ignore_errors: true
register: ssh_port_default
when:
- ssh_port is defined
- ssh_port.state is undefined
- name: Set backend SSH port to 22
set_fact:
ansible_ssh_port: 22
when:
- ssh_port_default.state is defined
The playbook itself
- hosts: "example_cloud_host01"
gather_facts: false
roles:
- role: check_port #check if we already have the correct port or need 22
- role: sshd #Set Port to 2222 and restart sshd
- role: check_port #check the port again, after it has been changed
- role: install_apps
- role: configure_apps
Error message:
with delegate_to
for the task Check backend port 2222
:
fatal: [example_cloud_host01 -> [email protected]]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host 123.123.123.123 port 22: Connection refused", "unreachable": true}
This confuses me, because I expect the delegation host to use the same ansible_ssh_port
as the target host.
Without delegate_to
for task Check backend port 2222
and Check backend port 22
:
fatal: [example_cloud_host01]: FAILED! => {"ansible_facts": {"discovered_interpreter_python": "/usr/bin/python3"}, "changed": false, "elapsed": 5, "msg": "Timeout when waiting for example_cloud_host01:2222"}
fatal: [example_cloud_host01]: FAILED! => {"ansible_facts": {"discovered_interpreter_python": "/usr/bin/python3"}, "changed": false, "elapsed": 5, "msg": "Timeout when waiting for example_cloud_host01:22"}
I have no idea why this happens. If I try the connection manually, it works fine.
What I tried so far:
- I played around with
delegate_to
,vars
, ... as mentioned above. - I wanted to see if I can provide
delegato_to
with the proper port 2222 for the jump host. - I wanted to see if can run this without
delegate_to
(since it should automatically use the proxy command to run on the jump host anyway).
Neither way gave me a solution on how to connect to my second tier servers after changing the SSH port.
Right now, I split the playbook into two
- deploy sshd config with port 22
- run our full deploy afterwards on port 2222
Solution
I would do the following (I somewhat tested this with fake values in the inventory using localhost as a jumphost to check ports on localhost as well)
Edit: modified my examples to somewhat try to show you a way after your comments on your question an on this answer
Inventory
---
all:
vars:
ansible_ssh_port: 2222
proxies:
vars:
ansible_user: ansible
hosts:
example_jumphost1:
ansible_host: 123.123.123.123
example_jumphost2:
ansible_host: 231.231.231.231
# ... and more jump hosts ...
cloud_hosts:
vars:
jump_vars: "{{ hostvars[jump_host] }}"
ansible_ssh_common_args: '-oProxyCommand="ssh -W %h:%p -oStrictHostKeyChecking=no -q {{ jump_vars.ansible_user }}@{{ jump_vars.ansible_host }} -p {{ jump_vars.ansible_shh_port | d(22) }}"'
children:
cloud_hosts_north:
vars:
jump_host: example_jumphost1
hosts:
example_cloud_host01:
example_cloud_host02:
# ... and more ...
cloud_hosts_south:
var:
jump_host: example_jumphost2
hosts:
example_cloud_host03:
example_cloud_host04:
# ... and more ...
# ... and more cloud groups ...
Tasks to check ports.
- name: "Check backend inventory configured port {{ ansible_ssh_port }}"
wait_for:
port: "{{ ansible_ssh_port }}"
state: "started"
host: "{{ inventory_hostname }}"
connect_timeout: "5"
timeout: "5"
delegate_to: "{{ jump_host }}"
ignore_errors: true
register: ssh_port
- name: "Check backend default ssh port if relevant"
wait_for:
port: "22"
state: "started"
host: "{{ inventory_hostname }}"
connect_timeout: "5"
timeout: "5"
delegate_to: "{{ jump_host }}"
ignore_errors: true
register: ssh_port_default
when: ssh_port is failed
- name: "Set backend SSH port to 22 if we did not change it yet"
set_fact:
ansible_ssh_port: 22
when:
- ssh_port_default is not skipped
- ssh_port_default is success
Please note that if checks for ports 22
/2222
both fail, your configured port will still be 2222
but any later task will obviously fail. You might want to fail fast after checks for those relevant hosts:
- name: "Fail host if no port is available"
fail:
msg:
- "Host {{ inventory_hostname }}" does not have"
- "any ssh port available (tested 22 and 2222)"
when:
- ssh_port is failed
- ssh_port_default is failed
With this in place, you can use different targets on your play to reach the relevant hosts:
- For jump hosts
- Run on a single bastion host: e.g.
hosts: example_jumphost1
- Run on all bastion hosts:
hosts: proxies
- Run on a single bastion host: e.g.
- For cloud hosts
- Run on all cloud hosts:
hosts: cloud_hosts
- Run on a single child group: e.g.
hosts: cloud_hosts_north
- Run on all cloud hosts except a subgroup: e.g.
hosts: cloud_hosts:!cloud_hosts_south
- Run on all cloud hosts:
For more see ansible patterns
Answered By - Zeitounator Answer Checked By - Senaida (WPSolving Volunteer)