amazon-web-services - AWS ECS 容器无法启动 - ecs-agent.log 中的 EC2MetadataError
问题描述
我正在尝试在 AWS Batch 中使用自定义 AMI。AMI 已配置为批处理兼容,但 ECS 容器不会启动。当我尝试将 AMI 包含在批处理作业中时,该作业卡在“可运行”下。当我登录到我的容器并查看 /var/log/ecs-agent.log 时,我会看到以下消息。这是我第一次批量尝试自定义 AMI,所以我真的不确定错误来自哪里,也无法在线找到任何答案。
level=info time=2021-08-05T20:35:31Z msg="Successfully got ECS instance credentials from provider: EC2RoleProvider" module=instancecreds.go
level=info time=2021-08-05T20:35:31Z msg="Loading configuration" module=agent.go
level=warn time=2021-08-05T20:35:31Z msg="Unable to fetch user data: EC2MetadataError: failed to make EC2Metadata request\n\tstatus code: 404, request id: \ncaused by: <?xml version=\"1.0\" encoding=\"iso-8859-1\"?>\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\"\n\t\"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\n<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"en\" lang=\"en\">\n <head>\n <title>404 - Not Found</title>\n </head>\n <body>\n <h1>404 - Not Found</h1>\n </body>\n</html>\n" module=config.go
level=info time=2021-08-05T20:35:31Z msg="Amazon ECS agent Version: 1.54.1, Commit: 3e20420f" module=agent.go
level=info time=2021-08-05T20:35:31Z msg="Successfully got ECS instance credentials from provider: EC2RoleProvider" module=instancecreds.go
level=info time=2021-08-05T20:35:31Z msg="Successfully got ECS instance credentials from provider: EC2RoleProvider" module=instancecreds.go
level=info time=2021-08-05T20:35:31Z msg="Image excluded from cleanup: amazon/amazon-ecs-pause:0.1.0" module=docker_image_manager.go
level=info time=2021-08-05T20:35:31Z msg="Image excluded from cleanup: amazon/amazon-ecs-pause:0.1.0" module=docker_image_manager.go
level=info time=2021-08-05T20:35:31Z msg="Image excluded from cleanup: amazon/amazon-ecs-agent:latest" module=docker_image_manager.go
level=info time=2021-08-05T20:35:31Z msg="Creating root ecs cgroup: /ecs" module=init_linux.go
level=info time=2021-08-05T20:35:31Z msg="Creating cgroup /ecs" module=cgroup_controller_linux.go
level=warn time=2021-08-05T20:35:31Z msg="Disabling TaskCPUMemLimit because agent is unabled to setup '/ecs' cgroup: cgroup create: unable to create controller: mkdir /sys/fs/cgroup/systemd/ecs: read-only file system" module=agent_unix.go
level=info time=2021-08-05T20:35:31Z msg="Event stream ContainerChange start listening..." module=eventstream.go
level=info time=2021-08-05T20:35:31Z msg="Loading state!" module=state_manager.go
level=info time=2021-08-05T20:35:32Z msg="Registering Instance with ECS" module=agent.go
level=info time=2021-08-05T20:35:32Z msg="Remaining mem: 7455" module=client.go
level=error time=2021-08-05T20:35:52Z msg="Unable to register as a container instance with ECS: RequestError: send request failed\ncaused by: Post \"https://ecs.us-east-1.amazonaws.com/\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)" module=client.go
level=error time=2021-08-05T20:35:52Z msg="Error registering: RequestError: send request failed\ncaused by: Post \"https://ecs.us-east-1.amazonaws.com/\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)" module=agent.go
解决方案
已解决:ECS 代理未正确安装在我的自定义 AMI 中。
批量运行我的自定义 AMI 的最终解决方案是在用户数据部分使用以下脚本创建一个启动模板。这将运行启动时设置的批处理兼容性。然后可以在计算环境中批量指定启动模板。
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="==MYBOUNDARY=="
--==MYBOUNDARY==
Content-Type: text/x-shellscript; charset="us-ascii"
#!/bin/bash
sudo apt-get install iptables-persistent
sudo iptables -t nat -A PREROUTING -p tcp -d 169.254.170.2 --dport 80 -j DNAT --to-destination 127.0.0.1:51679
sudo iptables -t nat -A OUTPUT -d 169.254.170.2 -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 51679
mkdir -p /etc/ecs && sudo touch /etc/ecs/ecs.config
mkdir -p /var/log/ecs /var/lib/ecs/data
cat <<EOF >>/etc/ecs/ecs.config
ECS_DATADIR=/data
ECS_ENABLE_TASK_IAM_ROLE=true
ECS_ENABLE_TASK_IAM_ROLE_NETWORK_HOST=true
ECS_LOGFILE=/log/ecs-agent.log
ECS_AVAILABLE_LOGGING_DRIVERS=["json-file","awslogs"]
ECS_LOGLEVEL=info
ECS_ENABLE_AWSLOGS_EXECUTIONROLE_OVERRIDE=true
EOF
cat <<EOF >>/etc/systemd/system/ecs-agent.service
[Unit]
Description=AWS ECS Agent
Requires=docker.service
After=docker.service
[Service]
TimeoutStartSec=0
RestartSec=10
Restart=always
KillMode=none
ExecStartPre=/usr/bin/docker pull amazon/amazon-ecs-agent:latest
ExecStart=/usr/bin/docker run --name %n \
--restart=on-failure:10 \
--volume=/var/run/docker.sock:/var/run/docker.sock \
--volume=/var/log/ecs:/log \
--volume=/var/lib/ecs/data:/data \
--net=host \
--env-file=/etc/ecs/ecs.config \
--env=ECS_LOGFILE=/log/ecs-agent.log \
--env=ECS_DATADIR=/data/ \
--env=ECS_ENABLE_TASK_IAM_ROLE=true \
--env=ECS_ENABLE_TASK_IAM_ROLE_NETWORK_HOST=true \
--env=ECS_IMAGE_CLEANUP_INTERVAL=10m \
--env=ECS_IMAGE_MINIMUM_CLEANUP_AGE=20m \
--env=ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION=1h \
--env=ECS_NUM_IMAGES_DELETE_PER_CYCLE=10 \
amazon/amazon-ecs-agent:latest
[Install]
WantedBy=multi-user.target
EOF
systemctl enable --now --no-block ecs-agent.service
--==MYBOUNDARY==--
推荐阅读
- c# - Unity Tilemap 图层顺序排序【瓷砖前后的玩家】由 4 个松散瓷砖组成的树精灵
- php - 在对 PHP 脚本的 AJAX 调用中使用 DELETE 方法时是否可以传递参数?
- javascript - Jqueryvalidate 的 .validate() 方法不在 ASP Core 中执行
- sql - can = 后跟 sql 中的变量
- python - Web Scrape Vanguard 投资账户(登录)
- reactjs - React Redux - Cannot set property 'quantity' of undefined
- c# - app.UseRouting() 和 app.UseEndPoints() 有什么区别?
- aws-lambda - 为 Alexa Amazon 创建示例音频应用程序播放器
- ios - 我应该选择 ViewController 还是 TableViewController?
- c++ - 我在 C++ 程序中声明变量时遇到问题