首页 > 解决方案 > 刷新 EC2 实例标签失败:SharedCredsLoad

问题描述

我一直在努力从 CloudWatch 代理获取基本指标。我一直收到这个错误,我不知道它是什么意思,也找不到在线资源来谈论它

刷新 EC2 实例标签失败:SharedCredsLoad:未能获取配置文件,指标将被删除,直到它得到修复

我按照这里的说明仔细阅读了文档。同样,目标只是将一些基本指标从我的 EC2 实例读入 CloudWatch。以下是我遵循的步骤:

root@ip-172-31-71-5:/opt/aws/amazon-cloudwatch-agent/etc# tail -n 4 common-config.toml 
#### BEGIN ANSIBLE MANAGED BLOCK ####
[credentials]
shared_credential_file = "/home/cwagent/.aws/credentials"
#### END ANSIBLE MANAGED BLOCK ####

这是我现在在日志中看到的错误,我假设为什么这就是我看不到任何指标的原因

root@ip-172-31-71-5:/opt/aws/amazon-cloudwatch-agent/logs# tail -n 20 amazon-cloudwatch-agent.log 
2019/10/29 22:41:08 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json ...
2019/10/29 22:41:08 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_config.json ...
2019/10/29 22:41:08 I! Detected runAsUser: cwagent
2019/10/29 22:41:08 I! Change ownership to cwagent:cwagent
2019/10/29 22:41:08 I! Set HOME: /home/cwagent
2019-10-29T22:41:08Z I! will use file based credentials provider 
2019-10-29T22:41:08Z I! cloudwatch: get unique roll up list []
2019-10-29T22:41:08Z I! Starting AmazonCloudWatchAgent (version 1.230621.0)
2019-10-29T22:41:08Z I! Loaded outputs: cloudwatch
2019-10-29T22:41:08Z I! cloudwatch: publish with ForceFlushInterval: 1m0s, Publish Jitter: 37s
2019-10-29T22:41:08Z I! Loaded inputs: disk mem
2019-10-29T22:41:08Z I! Tags enabled: host=ip-172-31-71-5
2019-10-29T22:41:08Z I! Agent Config: Interval:10s, Quiet:false, Hostname:"ip-172-31-71-5", Flush Interval:1s 
2019-10-29T22:41:08Z I! will use file based credentials provider 
2019-10-29T22:41:08Z E! refresh EC2 Instance Tags failed: SharedCredsLoad: failed to get profile, metrics will be dropped until it got fixed
2019-10-29T22:42:37Z E! refresh EC2 Instance Tags failed: SharedCredsLoad: failed to get profile, metrics will be dropped until it got fixed
2019-10-29T22:43:37Z E! refresh EC2 Instance Tags failed: SharedCredsLoad: failed to get profile, metrics will be dropped until it got fixed
2019-10-29T22:46:37Z E! refresh EC2 Instance Tags failed: SharedCredsLoad: failed to get profile, metrics will be dropped until it got fixed
2019-10-29T22:49:37Z E! refresh EC2 Instance Tags failed: SharedCredsLoad: failed to get profile, metrics will be dropped until it got fixed
2019-10-29T22:52:37Z E! refresh EC2 Instance Tags failed: SharedCredsLoad: failed to get profile, metrics will be dropped until it got fixed

和我使用的 config.json

root@ip-172-31-71-5:/opt/aws/amazon-cloudwatch-agent/bin# cat config.json
{
    "agent": {
        "metrics_collection_interval": 10,
        "run_as_user": "cwagent"
    },
    "metrics": {
        "namespace": "TestNamespace",
        "append_dimensions": {
            "AutoScalingGroupName": "${aws:AutoScalingGroupName}",
            "ImageId": "${aws:ImageId}",
            "InstanceId": "${aws:InstanceId}",
            "InstanceType": "${aws:InstanceType}"
        },
        "metrics_collected": {
            "disk": {
                "measurement": [
                    "used_percent"
                ],
                "metrics_collection_interval": 60,
                "resources": [
                    "*"
                ]
            },
            "mem": {
                "measurement": [
                    "mem_used_percent"
                ],
                "metrics_collection_interval": 60
            }
        }
    }
}

编辑

删除凭据修改后我得到了它

root@ip-172-31-71-5:/opt/aws/amazon-cloudwatch-agent/etc# tail -n 4 common-config.toml 
#### BEGIN ANSIBLE MANAGED BLOCK ####
#[credentials]
#shared_credential_file = "/home/cwagent/.aws/credentials"
#### END ANSIBLE MANAGED BLOCK ####

在我继续将配置文件复制到它检查的默认位置之后(即使文档说你可以像我一样传递文件名)。

root@ip-172-31-71-5:/opt/aws/amazon-cloudwatch-agent/bin# cp config.json /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json
root@ip-172-31-71-5:/opt/aws/amazon-cloudwatch-agent/bin# cd ../etc/
root@ip-172-31-71-5:/opt/aws/amazon-cloudwatch-agent/etc# chown cwagent:cwagent amazon-cloudwatch-agent.json 
root@ip-172-31-71-5:/opt/aws/amazon-cloudwatch-agent/etc# ls -l
total 16
drwxr-xr-x 2 cwagent cwagent 4096 Oct 30 22:05 amazon-cloudwatch-agent.d
-rwxr-xr-x 1 cwagent cwagent  611 Oct 30 22:11 amazon-cloudwatch-agent.json
-rw-rw-r-- 1 cwagent cwagent 1144 Oct 30 22:05 amazon-cloudwatch-agent.toml
-rw-r--r-- 1 cwagent cwagent 1073 Oct 30 22:05 common-config.toml

标签: amazon-web-servicesamazon-cloudwatch

解决方案


该错误似乎与访问与 Amazon EC2 实例关联的标签有关。

您链接的安装说明建议创建一个CloudWatchAgentServerPolicy附加策略的 IAM 角色。该政策包括描述标签的权限

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "cloudwatch:PutMetricData",
                "ec2:DescribeVolumes",
                "ec2:DescribeTags",
                "logs:PutLogEvents",
                "logs:DescribeLogStreams",
                "logs:DescribeLogGroups",
                "logs:CreateLogStream",
                "logs:CreateLogGroup"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ssm:GetParameter"
            ],
            "Resource": "arn:aws:ssm:*:*:parameter/AmazonCloudWatch-*"
        }
    ]
}

该服务器上的 CloudWatch 代理似乎没有收到此类权限,因此无法列出标签。

所以:

  • 确认已创建 IAM 角色并且它包含CloudWatchAgentServerPolicy策略
  • 确认此 IAM 角色已分配给运行 CloudWatch 代理的 Amazon EC2 实例
  • 如果仍然失败,请检查是否有任何凭证本地存储在代理可以使用的实例上,而不是分配给实例的 IAM 角色

推荐阅读