我正在尝试通过 SSH 连接到新配置的 EC2 实例。但是,Ansible 总是无法通过 SSH 连接到远程机器。

不幸的是,我已经用尽了我能想到的关于这个问题的所有资源/选项。我已经尝试过:人们在类似问题上建议的不同配置;使用不同版本的 Ansible;重新启动我的机器;审查我的文件一百万次以确保没有错别字。


- name: Configure EC2 instance
  hosts: "localhost"
  connection: "local"
  gather_facts: no
    REGION: "us-east-2"
    - secrets.yml

  - name: Provision EC2 instance
    aws_access_key: "{{ AWS_ACCESS_KEY_ID }}"
    aws_secret_key: "{{ AWS_SECRET_ACCESS_KEY }}"
    register: ec2

  - name: Wait for SSH to come up 
    delay: 10
    timeout: 120
    loop: "{{ ec2.instances }}"

  - name: Add new instance public DNS to host group 
    hostname: "{{ ec2.instances[0].public_dns_name }}"
    groups: "ec2"

 - name: SSH into EC2
   hosts: "ec2"
   connection: "ssh"
   remote_user: "ubuntu"
   gather_facts: yes

   - name: Wait for user data script to complete execution
     path: /var/log/cloud-init-output.log
     search_regex: AMI BUILD COMPLETE
     delay: 15
     timeout: 120


host_key_checking = False
private_key_file = /Users/dev/Projects/aws/keys/private-key.pem
stdout_callback = debug
log_path = /var/log/ansible/ansible.log

transfer_method = scp
ssh_args = -C -o ControlMaster=auto -o ControlPersist=200 -o ConnectTimeout=30 -o ServerAliveInterval=50
scp_if_ssh = True

connect_timeout = 300


sudo ANSIBLE_DEBUG=1 ansible-playbook infra/aws/ansible/ec2-provisioning.yml -vvvvv --ask-vault-pass

该错误发生在“SSH 到 EC2”播放中,特别是在收集事实部分。这是该播放/任务的整个日志块:

2020-06-05 12:16:43,795 p=root u=9572 | PLAY [SSH into EC2] ****************************************************************************************************************************************************
2020-06-05 12:16:43,805 p=root u=9572 | TASK [Gathering Facts] *************************************************************************************************************************************************
2020-06-05 12:16:43,817 p=root u=9623 | <ec2-3-23-59-101.us-east-2.compute.amazonaws.com> ESTABLISH SSH CONNECTION FOR USER: ubuntu
2020-06-05 12:16:43,818 p=root u=9623 | <ec2-3-23-59-101.us-east-2.compute.amazonaws.com> SSH: ansible.cfg set ssh_args: (-C)(-o)(ControlMaster=auto)(-o)(ControlPersist=200)(-o)(ConnectTimeout=30)(-o)(ServerAliveInterval=50)
2020-06-05 12:16:43,818 p=root u=9623 | <ec2-3-23-59-101.us-east-2.compute.amazonaws.com> SSH: ANSIBLE_HOST_KEY_CHECKING/host_key_checking disabled: (-o)(StrictHostKeyChecking=no)
2020-06-05 12:16:43,819 p=root u=9623 | <ec2-3-23-59-101.us-east-2.compute.amazonaws.com> SSH: ANSIBLE_PRIVATE_KEY_FILE/private_key_file/ansible_ssh_private_key_file set: (-o)(IdentityFile="/Users/dev/Projects/aws/keys/private-key.pem")
2020-06-05 12:16:43,819 p=root u=9623 | <ec2-3-23-59-101.us-east-2.compute.amazonaws.com> SSH: ansible_password/ansible_ssh_password not set: (-o)(KbdInteractiveAuthentication=no)(-o)(PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey)(-o)(PasswordAuthentication=no)
2020-06-05 12:16:43,820 p=root u=9623 | <ec2-3-23-59-101.us-east-2.compute.amazonaws.com> SSH: ANSIBLE_REMOTE_USER/remote_user/ansible_user/user/-u set: (-o)(User="ubuntu")
2020-06-05 12:16:43,820 p=root u=9623 | <ec2-3-23-59-101.us-east-2.compute.amazonaws.com> SSH: ANSIBLE_TIMEOUT/timeout set: (-o)(ConnectTimeout=10)
2020-06-05 12:16:43,820 p=root u=9623 | <ec2-3-23-59-101.us-east-2.compute.amazonaws.com> SSH: PlayContext set ssh_common_args: ()
2020-06-05 12:16:43,821 p=root u=9623 | <ec2-3-23-59-101.us-east-2.compute.amazonaws.com> SSH: PlayContext set ssh_extra_args: ()
2020-06-05 12:16:43,822 p=root u=9623 | <ec2-3-23-59-101.us-east-2.compute.amazonaws.com> SSH: found only ControlPersist; added ControlPath: (-o)(ControlPath=/Users/dev/.ansible/cp/4a306014bf)
2020-06-05 12:16:43,822 p=root u=9623 | <ec2-3-23-59-101.us-east-2.compute.amazonaws.com> SSH: EXEC ssh -vvv -C -o ControlMaster=auto -o ControlPersist=200 -o ConnectTimeout=30 -o ServerAliveInterval=50 -o StrictHostKeyChecking=no -o 'IdentityFile="/Users/dev/Projects/aws/keys/private-key.pem"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="ubuntu"' -o ConnectTimeout=10 -o ControlPath=/Users/dev/.ansible/cp/4a306014bf ec2-3-23-59-101.us-east-2.compute.amazonaws.com '/bin/sh -c '"'"'echo ~ubuntu && sleep 0'"'"''
2020-06-05 12:16:45,132 p=root u=9623 | <ec2-3-23-59-101.us-east-2.compute.amazonaws.com> (255, b'', b'OpenSSH_8.1p1, LibreSSL 2.7.3\r\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug1: /etc/ssh/ssh_config line 47: Applying options for *\r\ndebug1: auto-mux: Trying existing master\r\ndebug1: Control socket "/Users/dev/.ansible/cp/4a306014bf" does not exist\r\ndebug2: resolving "ec2-3-23-59-101.us-east-2.compute.amazonaws.com" port 22\r\ndebug2: ssh_connect_direct\r\ndebug1: Connecting to ec2-3-23-59-101.us-east-2.compute.amazonaws.com [] port 22.\r\ndebug2: fd 5 setting O_NONBLOCK\r\ndebug1: connect to address port 22: Connection refused\r\nssh: connect to host ec2-3-23-59-101.us-east-2.compute.amazonaws.com port 22: Connection refused\r\n')
2020-06-05 12:16:45,137 p=root u=9572 | fatal: [ec2-3-23-59-101.us-east-2.compute.amazonaws.com]: UNREACHABLE! => {
    "changed": false,
    "unreachable": true


Failed to connect to the host via ssh: OpenSSH_8.1p1, LibreSSL 2.7.3
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 47: Applying options for *
debug1: auto-mux: Trying existing master
debug1: Control socket "/Users/dev/.ansible/cp/4a306014bf" does not exist
debug2: resolving "ec2-3-23-59-101.us-east-2.compute.amazonaws.com" port 22
debug2: ssh_connect_direct
debug1: Connecting to ec2-3-23-59-101.us-east-2.compute.amazonaws.com [] port 22.
debug2: fd 5 setting O_NONBLOCK
debug1: connect to address port 22: Connection refused
ssh: connect to host ec2-3-23-59-101.us-east-2.compute.amazonaws.com port 22: Connection refused

2020-06-05 12:16:45,139 p=root u=9572 | PLAY RECAP *************************************************************************************************************************************************************
2020-06-05 12:16:45,140 p=root u=9572 | ec2-3-23-59-101.us-east-2.compute.amazonaws.com : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0
2020-06-05 12:16:45,140 p=root u=9572 | localhost                  : ok=3    changed=2    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

这里对我来说突出的部分是:Control socket "/Users/dev/.ansible/cp/4a306014bf" does not exist.

我可以看到 Ansible 正在尝试执行的命令,即:

ssh -vvv -C -o ControlMaster=auto -o ControlPersist=200 -o ConnectTimeout=30 -o ServerAliveInterval=50 -o StrictHostKeyChecking=no -o 'IdentityFile="/Users/dev/Projects/aws/keys/private-key.pem"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="ubuntu"' -o ConnectTimeout=10 -o ControlPath=/Users/dev/.ansible/cp/4a306014bf ec2-3-23-59-101.us-east-2.compute.amazonaws.com '/bin/sh -c '"'"'echo ~ubuntu && sleep 0'"'"''

如果我通过 SSH 调试直接从命令行运行它,SSH 会返回:

debug3: send packet: type 97
debug2: channel 2: is dead
debug2: channel 2: gc: notify user
debug3: mux_master_session_cleanup_cb: entering for channel 2
debug2: channel 1: rcvd close
debug2: channel 1: output open -> drain
debug2: channel 1: chan_shutdown_read (i0 o1 sock 3 wfd 3 efd -1 [closed])
debug2: channel 1: input open -> closed
debug2: channel 2: gc: user detached
debug2: channel 2: is dead
debug2: channel 2: garbage collecting
debug1: channel 2: free: client-session, nchannels 3
debug3: channel 2: status: The following connections are open:
#1 mux-control (t16 nr0 i3/0 o1/16 e[closed]/0 fd 3/3/-1 sock 3 cc -1)
#2 client-session (t4 r0 i3/0 o3/0 e[write]/0 fd -1/-1/9 sock -1 cc -1)

debug2: channel 1: obuf empty
debug2: channel 1: chan_shutdown_write (i3 o1 sock 3 wfd 3 efd -1 [closed])
debug2: channel 1: output drain -> closed
debug2: channel 1: is dead (local)
debug2: channel 1: gc: notify user
debug3: mux_master_control_cleanup_cb: entering for channel 1
debug2: channel 1: gc: user detached
debug2: channel 1: is dead (local)
debug2: channel 1: garbage collecting
debug1: channel 1: free: mux-control, nchannels 2
debug3: channel 1: status: The following connections are open:
#1 mux-control (t16 nr0 i3/0 o3/0 e[closed]/0 fd 3/3/-1 sock 3 cc -1)

debug3: mux_client_read_packet: read header failed: Broken pipe
debug2: Received exit status from master 0
debug2: set_control_persist_exit_time: schedule exit in 200 seconds

再次突出的部分是debug3: mux_client_read_packet: read header failed: Broken pipe线条。如果我'/bin/sh -c '"'"'echo ~ubuntu && sleep 0'"'"''从命令末尾删除该部分并从命令行再次运行它,它会成功连接。不幸的是,我不能告诉 Ansible 删除该部分命令。


我一直无法弄清楚这个问题的根本原因(如果我这样做了,我会更新),但是在 SSH 播放之前添加一个短暂的等待是目前一个可用的解决方法:

  - name: Hard wait (30 seconds) before SSHing into EC2 instance
      seconds: 30

