google-cloud-platform - 谷歌云监控和每个事件的通知数量
问题描述
我正在尝试通过 terraform 设置谷歌云作曲家监视器,这是我的“helloworld”代码(有效但不符合我的接受标准):
terraform {
required_providers {
google = {
source = "hashicorp/google"
version = "3.5.0"
}
}
}
provider "google" {
credentials = "some_credentials"
project = "some_project"
region = "some_region"
zone = "some_zone"
}
resource "google_monitoring_notification_channel" "basic" {
display_name = "Test name"
type = "email"
labels = {
email_address = "some@email.com"
}
}
resource "google_monitoring_alert_policy" "cloud_composer_job_fail_monitor" {
combiner = "OR"
display_name = "Fails testing on cloud composer tasks"
notification_channels = [google_monitoring_notification_channel.basic.id]
conditions {
display_name = "Failures count"
condition_threshold {
filter = "resource.type=\"cloud_composer_workflow\" AND metric.type=\"composer.googleapis.com/workflow/task/run_count\" AND resource.label.\"project_id\"=\"some_project\" AND metric.label.\"state\"=\"failed\" AND resource.label.\"location\"=\"some_region\""
duration = "60s"
comparison = "COMPARISON_GT"
threshold_value = 0
aggregations {
alignment_period = "3600s"
per_series_aligner = "ALIGN_COUNT"
}
}
}
documentation {
content = "Please checkout current incident"
}
}
问题:默认情况下,当触发或解决警报策略时会发送通知(谷歌文档)。
我的问题:我想每 30 分钟(例如)在 Cloud Composer 作业失败时收到警报通知,直到我或其他人无法解决事件(或者我需要了解为什么在作业停止时事件没有自动解决失败)
有人可以帮助解决这个问题吗?
谢谢您的帮助!
解决方案
问题是在这些领域进行更改:
- per_series_aligner
- 期间
- 对齐周期
因此,这些更改将使得获得有关具有失败状态的云编写器任务的警报通知成为可能,并实际上将触发器更改为更快地满足条件:
terraform {
required_providers {
google = {
source = "hashicorp/google"
version = "3.5.0"
}
}
}
provider "google" {
credentials = "some_credentials"
project = "some_project"
region = "some_region"
zone = "some_zone"
}
resource "google_monitoring_notification_channel" "basic" {
display_name = "Test name"
type = "email"
labels = {
email_address = "some@email.com"
}
}
resource "google_monitoring_alert_policy" "cloud_composer_job_fail_monitor" {
combiner = "OR"
display_name = "Fails testing on cloud composer tasks"
notification_channels = [google_monitoring_notification_channel.basic.id]
conditions {
display_name = "Failures count"
condition_threshold {
filter = "resource.type=\"cloud_composer_workflow\" AND metric.type=\"composer.googleapis.com/workflow/task/run_count\" AND resource.label.\"project_id\"=\"some_project\" AND metric.label.\"state\"=\"failed\" AND resource.label.\"location\"=\"some_region\""
duration = "0s"
comparison = "COMPARISON_GT"
threshold_value = 0
aggregations {
alignment_period = "60s"
per_series_aligner = "ALIGN_DELTA"
}
}
}
documentation {
content = "Please checkout current incident"
}
}
没有关于使用这种设置的连续通知(例如每 30 分钟一次)的信息。
只有在满足您的条件时才会通知您。
推荐阅读
- docker - 使用 Docker Toolbox 从图像中提取和更改代码
- r - 从列中的值从最大到最小对R中的数据框进行排序
- java - Android Studio 断言不起作用
- c# - 如何以不同的形式更新图像c#
- css - 在 django-autocomplete-light 3.1.3 上修改 CSS
- git - 将 github 包的特定提交合并到我项目中使用的 github 包中
- java - 在 Firefox 上出现错误“ssl_error_weak_server_ephemeral_dh_key”
- javascript - Firebase Cloud Function - 将文档添加到 Firestore 时向 iOS 设备推送通知
- android - 单击按钮后如何重复相同的文本?
- ruby-on-rails - Group ActiveRecord::Relation By Field Into Hash With Dictionary