首页 > 解决方案 > Splunk:如何计算事件持续时间记录?

问题描述

我在 Splunk 中有以下事件:

_time                           Agent_Hostname      alarm               status
2020-08-23T03:04:05.000-0700    m50-ups.a_domain    upsAlarmOnBypass    raised
2020-08-23T03:07:16.000-0700    m50-ups.a_domain    upsTrapOnBattery    raised
2020-08-23T03:07:16.000-0700    m50-ups.a_domain    upsAlarmInputBad    raised
2020-08-23T03:07:39.000-0700    m50-ups.a_domain    upsAlarmOnBypass    raised
2020-08-23T03:07:39.000-0700    m50-ups.a_domain    upsAlarmLowBattery  raised
2020-08-23T03:08:17.000-0700    m50-ups.a_domain    upsTrapOnBattery    raised
2020-08-23T03:09:24.000-0700    m50-ups.a_domain    upsTrapOnBattery    raised
2020-08-23T03:10:31.000-0700    m50-ups.a_domain    upsAlarmOnBattery   cleared
2020-08-23T03:10:32.000-0700    m50-ups.a_domain    upsAlarmInputBad    cleared
2020-08-23T03:11:12.000-0700    m50-ups.a_domain    upsAlarmLowBattery  cleared
2020-08-23T03:19:06.000-0700    m50-ups.a_domain    upsAlarmInputBad    raised
2020-08-23T03:19:06.000-0700    m50-ups.a_domain    upsTrapOnBattery    raised
2020-08-23T03:19:13.000-0700    m50-ups.a_domain    upsAlarmLowBattery  raised
2020-08-23T03:20:10.000-0700    m50-ups.a_domain    upsTrapOnBattery    raised
2020-08-23T03:21:16.000-0700    m50-ups.a_domain    upsTrapOnBattery    raised
2020-08-23T03:22:22.000-0700    m50-ups.a_domain    upsTrapOnBattery    raised
2020-08-23T03:23:29.000-0700    m50-ups.a_domain    upsTrapOnBattery    raised
2020-08-23T03:24:28.000-0700    m50-ups.a_domain    upsAlarmInputBad    cleared
2020-08-23T03:24:28.000-0700    m50-ups.a_domain    upsAlarmOnBattery   cleared
2020-08-23T03:25:09.000-0700    m50-ups.a_domain    upsAlarmLowBattery  cleared
2020-08-23T03:25:58.000-0700    m50-ups.a_domain    upsAlarmOnBypass    cleared

我的问题是如何计算每个主机和每种警报类型的事件持续时间记录,例如,从上述事件中,我将通过算法获得以下内容,而不仅仅是硬编码特定示例中的值:

start                        end                          Agent_Hostname   alarm
2020-08-23T03:04:05.000-0700 2020-08-23T03:25:58.000-0700 m50-ups.a_domain upsAlarmOnBypass
2020-08-23T03:07:16.000-0700                              m50-ups.a_domain upsTrapOnBattery
2020-08-23T03:07:16.000-0700 2020-08-23T03:24:28.000-0700 m50-ups.a_domain upsAlarmInputBad
2020-08-23T03:07:39.000-0700 2020-08-23T03:25:09.000-0700 m50-ups.a_domain upsAlarmLowBattery

其中 start 是第一次发出主机警报的最早时间,而 end 是清除同一警报/主机的时间。

我的第二个问题是如何在那些封闭的跨度中找到最大的持续时间跨度,忽略那些没有结束时间的。

我的问题是如何在 Splunk 的框架内实现?

标签: splunksplunk-querysplunk-formulasplunk-calculation

解决方案


transaction命令可以处理大部分内容。唯一我无法做到的是显示未完成的警报。

| makeresults 
| eval _raw="time                            Agent_Hostname      alarm               status
2020-08-23T03:04:05.000-0700    m50-ups.a_domain    upsAlarmOnBypass    raised
2020-08-23T03:07:16.000-0700    m50-ups.a_domain    upsTrapOnBattery    raised
2020-08-23T03:07:16.000-0700    m50-ups.a_domain    upsAlarmInputBad    raised
2020-08-23T03:07:39.000-0700    m50-ups.a_domain    upsAlarmOnBypass    raised
2020-08-23T03:07:39.000-0700    m50-ups.a_domain    upsAlarmLowBattery  raised
2020-08-23T03:08:17.000-0700    m50-ups.a_domain    upsTrapOnBattery    raised
2020-08-23T03:09:24.000-0700    m50-ups.a_domain    upsTrapOnBattery    raised
2020-08-23T03:10:31.000-0700    m50-ups.a_domain    upsAlarmOnBattery   cleared
2020-08-23T03:10:32.000-0700    m50-ups.a_domain    upsAlarmInputBad    cleared
2020-08-23T03:11:12.000-0700    m50-ups.a_domain    upsAlarmLowBattery  cleared
2020-08-23T03:19:06.000-0700    m50-ups.a_domain    upsAlarmInputBad    raised
2020-08-23T03:19:06.000-0700    m50-ups.a_domain    upsTrapOnBattery    raised
2020-08-23T03:19:13.000-0700    m50-ups.a_domain    upsAlarmLowBattery  raised
2020-08-23T03:20:10.000-0700    m50-ups.a_domain    upsTrapOnBattery    raised
2020-08-23T03:21:16.000-0700    m50-ups.a_domain    upsTrapOnBattery    raised
2020-08-23T03:22:22.000-0700    m50-ups.a_domain    upsTrapOnBattery    raised
2020-08-23T03:23:29.000-0700    m50-ups.a_domain    upsTrapOnBattery    raised
2020-08-23T03:24:28.000-0700    m50-ups.a_domain    upsAlarmInputBad    cleared
2020-08-23T03:24:28.000-0700    m50-ups.a_domain    upsAlarmOnBattery   cleared
2020-08-23T03:25:09.000-0700    m50-ups.a_domain    upsAlarmLowBattery  cleared
2020-08-23T03:25:58.000-0700    m50-ups.a_domain    upsAlarmOnBypass    cleared" 
| multikv forceheader=1 
| eval _time=strptime(time,"%Y-%m-%dT%H:%M:%S.%3N%z")
| fields _time Agent_Hostname alarm status 
```Everything above just defines test data - Remove Before Flight```
```Omit the reverse command if events are in descending order (the default)```
| reverse
```Set the start and end times based on status```
| eval start=if(status="raised",_time, NULL), end=if(status="cleared",_time, NULL)
```Define transactions based on "raised/cleared" pairs within host and alarm names```
| transaction Agent_Hostname alarm startswith="raised" endswith="cleared"
```Change duration display to hh:mm:ss```
| fieldformat duration=tostring(duration,"duration")
| table start end Agent_Hostname alarm duration

推荐阅读