首页 > 解决方案 > 当我的 Cloudwatch 警报进入“ALARM”状态时尝试触发 SSM:Run Command 操作

问题描述

当我的 cloudwatch 警报进入“ALARM”状态时,尝试触发 SSM:Run Command 操作。我正在尝试使用 Cloudwatch 规则 - 事件模式并通过获取 AWS Cloud Trail API 日志来实现这一点。

尝试将监控和事件名称设为“DescribeAlarms”,将 stateValue 设为“ALARM”。刚刚尝试添加我的 SNS 主题(而不是 SSM:RunCommand)以验证它在进入 ALARM 状态时会触发一封电子邮件给我,但没有运气。

```{
  "source": [
    "aws.monitoring"
  ],
  "detail-type": [
    "AWS API Call via CloudTrail"
  ],
  "detail": {
    "eventSource": [
      "monitoring.amazonaws.com"
    ],
    "eventName": [
      "DescribeAlarms"
    ],
    "requestParameters": {
      "stateValue": [
        "ALARM"
      ]
    }
  }
}```

我期待何时满足此条件 - 任何进入 ALARM 状态的警报都应该击中目标 - 这是我的 SNS 主题。

更新:

感谢@John 的澄清。正如您所建议的,我正在尝试使用 SNS-> Lambda-> SSM 运行命令。但我无法从 SNS 事件中获取实例 ID。它显示 [记录 - 密钥错误]。阅读您的一些帖子并尝试了所有。但无法通过。能否请你帮忙?

Received event: {
"Records": [
{
"EventSource": "aws:sns",
"EventVersion": "1.0",
"EventSubscriptionArn": "arn:aws:sns:eu-west-1:XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
"Sns": {
"Type": "Notification",
"MessageId": "********************c",
"TopicArn": "arn:aws:sns:eu-west-1:*******************************",
"Subject": "ALARM: \"!!! Critical Alert !!! Disk Space is going to be full in Automation Testing\" in EU (Ireland)",
"Message": "{\"AlarmName\":\"!!! Critical Alert !!! Disk Space is going to be full in Automation Testing\",\"AlarmDescription\":\"Disk Space is going to be full in Automation Testing\",\"AWSAccountId\":\"***********\",\"NewStateValue\":\"ALARM\",\"NewStateReason\":\"Threshold Crossed: 1 out of the last 1 datapoints [**********] was less than or equal to the threshold (70.0) (minimum 1 datapoint for OK -> ALARM transition).\",\"StateChangeTime\":\"******************\",\"Region\":\"EU (Ireland)\",\"OldStateValue\":\"OK\",\"Trigger\":{\"MetricName\":\"disk_used_percent\",\"Namespace\":\"CWAgent\",\"StatisticType\":\"Statistic\",\"Statistic\":\"AVERAGE\",\"Unit\":null,\"Dimensions\":[{\"value\":\"/\",\"name\":\"path\"},{\"value\":\"i-****************\",\"name\":\"InstanceId\"},{\"value\":\"ami-****************\",\"name\":\"ImageId\"},{\"value\":\"t2.micro\",\"name\":\"InstanceType\"},{\"value\":\"xvda1\",\"name\":\"device\"},{\"value\":\"xfs\",\"name\":\"fstype\"}],\"Period\":300,\"EvaluationPeriods\":1,\"ComparisonOperator\":\"LessThanOrEqualToThreshold\",\"Threshold\":70.0,\"TreatMissingData\":\"- TreatMissingData: missing\",\"EvaluateLowSampleCountPercentile\":\"\"}}",
"Timestamp": "2019-06-29T19:23:43.829Z",
"SignatureVersion": "1",
"Signature": "XXXXXXXXXXXX",
"SigningCertUrl": "https://sns.eu-west-1.amazonaws.com/XXXXXXXX.pem",
"UnsubscribeUrl": "https://sns.eu-west-1.amazonaws.com/?Action=Unsubscribe&SubscriptionArn=arn:aws:sns:eu-west-1XXXXXXXXXXXXXXXXXXXXX",
"MessageAttributes":
{}

}
}
]
}

下面是我的 Lambda 函数:

from __future__ import print_function
import boto3
import json
ssm = boto3.client('ssm')
ec2 = boto3.resource('ec2')

print('Loading function')

def lambda_handler(event, context):
    # Dump the event to the log, for debugging purposes
    print("Received event: " + json.dumps(event, indent=2))

    message = event['Records']['Sns']['Message']
    msg = json.loads(message)
    InstanceId = msg['InstanceId']['value']
    print ("Instance: %s" % InstanceId)

标签: jsonpython-3.xamazon-web-servicesaws-lambdaamazon-cloudwatch

解决方案


这可能不起作用,因为AWS CloudTrail 仅捕获对 AWS的 API 调用,并且 CloudWatch 警报进入ALARM状态是一种内部更改,不是由 API 调用引起的。

我会推荐:

  • Amazon CloudWatch 警报触发 AWS Lambda 函数
  • Lambda 函数调用 SSM 运行命令(例如send_command()

推荐阅读