首页 > 解决方案 > AWS Cloudwatch 警报简化警报以配置到整个粘合作业集群

问题描述

我正在尝试创建一个 cloudformation 模板来创建基于 AWS Glue 指标的 cloudwatch 警报。截至目前,代码很长,因为我必须添加胶水作业中使用的所有工人来检查作业的 CPU。

CpuLoadAlarm:
      Type: AWS::CloudWatch::Alarm
      Properties:
        AlarmName: !Ref CampaignCpuLoadAlarmName
        ActionsEnabled: true
        AlarmActions:
          - !Ref AssistGlueJobsMonitoringSNSTopic
        EvaluationPeriods: !Ref AlarmEvaluationPeriod
        DatapointsToAlarm: !Ref AlarmDatapointsToAlarm
        Threshold: !Ref AlarmThreshold
        ComparisonOperator: GreaterThanOrEqualToThreshold
        TreatMissingData: missing
        Metrics:
          - Id: e1
            Label: Expression1
            ReturnData: true
            Expression: !Ref AlarmExpression
          - Id: m1
            ReturnData: false
            MetricStat:
              Metric:
                Namespace: Glue
                MetricName: glue.1.system.cpuSystemLoad
                Dimensions:
                  - Name: Type
                    Value: gauge
                  - Name: JobRunId
                    Value: ALL
                  - Name: JobName
                    Value: !Ref JobName
              Period: !Ref AlarmPeriod
              Stat: Average
          - Id: m2
            ReturnData: false
            MetricStat:
              Metric:
                Namespace: Glue
                MetricName: glue.2.system.cpuSystemLoad
                Dimensions:
                  - Name: Type
                    Value: gauge
                  - Name: JobRunId
                    Value: ALL
                  - Name: JobName
                    Value: !Ref JobName
              Period: !Ref AlarmPeriod
              Stat: Average
          - Id: m3
            ReturnData: false
            MetricStat:
              Metric:
                Namespace: Glue
                MetricName: glue.3.system.cpuSystemLoad
                Dimensions:
                  - Name: Type
                    Value: gauge
                  - Name: JobRunId
                    Value: ALL
                  - Name: JobName
                    Value: !Ref JobName
              Period: !Ref AlarmPeriod
              Stat: Average
          - Id: m4
            ReturnData: false
            MetricStat:
              Metric:
                Namespace: Glue
                MetricName: glue.4.system.cpuSystemLoad
                Dimensions:
                  - Name: Type
                    Value: gauge
                  - Name: JobRunId
                    Value: ALL
                  - Name: JobName
                    Value: !Ref JobName
              Period: !Ref AlarmPeriod
              Stat: Average
          - Id: m5
            ReturnData: false
            MetricStat:
              Metric:
                Namespace: Glue
                MetricName: glue.5.system.cpuSystemLoad
                Dimensions:
                  - Name: Type
                    Value: gauge
                  - Name: JobRunId
                    Value: ALL
                  - Name: JobName
                    Value: !Ref JobName
              Period: !Ref AlarmPeriod
              Stat: Average
          - Id: m6
            ReturnData: false
            MetricStat:
              Metric:
                Namespace: Glue
                MetricName: glue.6.system.cpuSystemLoad
                Dimensions:
                  - Name: Type
                    Value: gauge
                  - Name: JobRunId
                    Value: ALL
                  - Name: JobName
                    Value: !Ref JobName
              Period: !Ref AlarmPeriod
              Stat: Average
          - Id: m7
            ReturnData: false
            MetricStat:
              Metric:
                Namespace: Glue
                MetricName: glue.7.system.cpuSystemLoad
                Dimensions:
                  - Name: Type
                    Value: gauge
                  - Name: JobRunId
                    Value: ALL
                  - Name: JobName
                    Value: !Ref JobName
              Period: !Ref AlarmPeriod
              Stat: Average
          - Id: m8
            ReturnData: false
            MetricStat:
              Metric:
                Namespace: Glue
                MetricName: glue.8.system.cpuSystemLoad
                Dimensions:
                  - Name: Type
                    Value: gauge
                  - Name: JobRunId
                    Value: ALL
                  - Name: JobName
                    Value: !Ref JobName
              Period: !Ref AlarmPeriod
              Stat: Average
          - Id: m9
            ReturnData: false
            MetricStat:
              Metric:
                Namespace: Glue
                MetricName: glue.9.system.cpuSystemLoad
                Dimensions:
                  - Name: Type
                    Value: gauge
                  - Name: JobRunId
                    Value: ALL
                  - Name: JobName
                    Value: !Ref JobName
              Period: !Ref AlarmPeriod
              Stat: Average
          - Id: m10
            ReturnData: false
            MetricStat:
              Metric:
                Namespace: Glue
                MetricName: glue.driver.system.cpuSystemLoad
                Dimensions:
                  - Name: Type
                    Value: gauge
                  - Name: JobRunId
                    Value: ALL
                  - Name: JobName
                    Value: !Ref JobName
              Period: !Ref AlarmPeriod
              Stat: Average

是否可以简化它,以便我们可以在整个粘合作业集群上使用阈值,这样我们就不必指定驱动程序和每个工作人员?

标签: amazon-web-servicesamazon-cloudwatchaws-glue

解决方案


推荐阅读