首页 > 解决方案 > AWS ECS 容器无法启动 - ecs-agent.log 中的 EC2MetadataError

问题描述

我正在尝试在 AWS Batch 中使用自定义 AMI。AMI 已配置为批处理兼容,但 ECS 容器不会启动。当我尝试将 AMI 包含在批处理作业中时,该作业卡在“可运行”下。当我登录到我的容器并查看 /var/log/ecs-agent.log 时,我会看到以下消息。这是我第一次批量尝试自定义 AMI,所以我真的不确定错误来自哪里,也无法在线找到任何答案。

level=info time=2021-08-05T20:35:31Z msg="Successfully got ECS instance credentials from provider: EC2RoleProvider" module=instancecreds.go
level=info time=2021-08-05T20:35:31Z msg="Loading configuration" module=agent.go
level=warn time=2021-08-05T20:35:31Z msg="Unable to fetch user data: EC2MetadataError: failed to make EC2Metadata request\n\tstatus code: 404, request id: \ncaused by: <?xml version=\"1.0\" encoding=\"iso-8859-1\"?>\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\"\n\t\"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\n<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"en\" lang=\"en\">\n <head>\n  <title>404 - Not Found</title>\n </head>\n <body>\n  <h1>404 - Not Found</h1>\n </body>\n</html>\n" module=config.go
level=info time=2021-08-05T20:35:31Z msg="Amazon ECS agent Version: 1.54.1, Commit: 3e20420f" module=agent.go
level=info time=2021-08-05T20:35:31Z msg="Successfully got ECS instance credentials from provider: EC2RoleProvider" module=instancecreds.go
level=info time=2021-08-05T20:35:31Z msg="Successfully got ECS instance credentials from provider: EC2RoleProvider" module=instancecreds.go
level=info time=2021-08-05T20:35:31Z msg="Image excluded from cleanup: amazon/amazon-ecs-pause:0.1.0" module=docker_image_manager.go
level=info time=2021-08-05T20:35:31Z msg="Image excluded from cleanup: amazon/amazon-ecs-pause:0.1.0" module=docker_image_manager.go
level=info time=2021-08-05T20:35:31Z msg="Image excluded from cleanup: amazon/amazon-ecs-agent:latest" module=docker_image_manager.go
level=info time=2021-08-05T20:35:31Z msg="Creating root ecs cgroup: /ecs" module=init_linux.go
level=info time=2021-08-05T20:35:31Z msg="Creating cgroup /ecs" module=cgroup_controller_linux.go
level=warn time=2021-08-05T20:35:31Z msg="Disabling TaskCPUMemLimit because agent is unabled to setup '/ecs' cgroup: cgroup create: unable to create controller: mkdir /sys/fs/cgroup/systemd/ecs: read-only file system" module=agent_unix.go
level=info time=2021-08-05T20:35:31Z msg="Event stream ContainerChange start listening..." module=eventstream.go
level=info time=2021-08-05T20:35:31Z msg="Loading state!" module=state_manager.go
level=info time=2021-08-05T20:35:32Z msg="Registering Instance with ECS" module=agent.go
level=info time=2021-08-05T20:35:32Z msg="Remaining mem: 7455" module=client.go
level=error time=2021-08-05T20:35:52Z msg="Unable to register as a container instance with ECS: RequestError: send request failed\ncaused by: Post \"https://ecs.us-east-1.amazonaws.com/\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)" module=client.go
level=error time=2021-08-05T20:35:52Z msg="Error registering: RequestError: send request failed\ncaused by: Post \"https://ecs.us-east-1.amazonaws.com/\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)" module=agent.go

标签: amazon-web-servicesamazon-ecsamazon-amiaws-batch

解决方案


已解决:ECS 代理未正确安装在我的自定义 AMI 中。

批量运行我的自定义 AMI 的最终解决方案是在用户数据部分使用以下脚本创建一个启动模板。这将运行启动时设置的批处理兼容性。然后可以在计算环境中批量指定启动模板。

MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="==MYBOUNDARY=="

--==MYBOUNDARY==
Content-Type: text/x-shellscript; charset="us-ascii"

#!/bin/bash
sudo apt-get install iptables-persistent
sudo iptables -t nat -A PREROUTING -p tcp -d 169.254.170.2 --dport 80 -j DNAT --to-destination 127.0.0.1:51679
sudo iptables -t nat -A OUTPUT -d 169.254.170.2 -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 51679

mkdir -p /etc/ecs && sudo touch /etc/ecs/ecs.config
mkdir -p /var/log/ecs /var/lib/ecs/data

cat <<EOF >>/etc/ecs/ecs.config
ECS_DATADIR=/data
ECS_ENABLE_TASK_IAM_ROLE=true
ECS_ENABLE_TASK_IAM_ROLE_NETWORK_HOST=true
ECS_LOGFILE=/log/ecs-agent.log
ECS_AVAILABLE_LOGGING_DRIVERS=["json-file","awslogs"]
ECS_LOGLEVEL=info
ECS_ENABLE_AWSLOGS_EXECUTIONROLE_OVERRIDE=true

EOF

cat <<EOF >>/etc/systemd/system/ecs-agent.service

[Unit]
Description=AWS ECS Agent
Requires=docker.service
After=docker.service

[Service]
TimeoutStartSec=0
RestartSec=10
Restart=always
KillMode=none


ExecStartPre=/usr/bin/docker pull amazon/amazon-ecs-agent:latest
ExecStart=/usr/bin/docker run --name %n \
--restart=on-failure:10 \
--volume=/var/run/docker.sock:/var/run/docker.sock \
--volume=/var/log/ecs:/log \
--volume=/var/lib/ecs/data:/data \
--net=host \
--env-file=/etc/ecs/ecs.config \
--env=ECS_LOGFILE=/log/ecs-agent.log \
--env=ECS_DATADIR=/data/ \
--env=ECS_ENABLE_TASK_IAM_ROLE=true \
--env=ECS_ENABLE_TASK_IAM_ROLE_NETWORK_HOST=true \
--env=ECS_IMAGE_CLEANUP_INTERVAL=10m \
--env=ECS_IMAGE_MINIMUM_CLEANUP_AGE=20m \
--env=ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION=1h \
--env=ECS_NUM_IMAGES_DELETE_PER_CYCLE=10 \
amazon/amazon-ecs-agent:latest

[Install]
WantedBy=multi-user.target
EOF

systemctl enable --now --no-block ecs-agent.service
--==MYBOUNDARY==--

推荐阅读