首页 > 解决方案 > 按时间限制重试,而不是尝试

问题描述

我有一份处理队列中项目的工作。当队列为空时,它会在几秒钟内完成。但是,如果队列中有项目,则有时需要几分钟(或几小时)才能完成。

我希望它在失败后自动重试(等待 5 分钟),除非最后一次成功的作业是 1 多小时前。如果它失败并且最后一次成功是在 1 小时前,我希望它通知我们(通过通知插件)

我怎样才能做到这一点?

标签: rundeck

解决方案


您可以创建一个“监控作业”,使用内联脚本步骤中 “嵌入”的Rundeck API来验证“主要作业”的状态。

该作业有一个内联脚本步骤,用于验证主作业的最新状态并计算最近一次成功执行与当前时间之间的分钟数,全部通过 API,脚本需要jqdateutils。如果发生这种情况,请禁用主作业的计划并使用“失败时”通知发送通知电子邮件

HelloWorld 作业(“主作业”,如您所说的要监控的作业,有 5 分钟的重试时间,我想这是一个预定的作业)。

<joblist>
  <job>
    <defaultTab>nodes</defaultTab>
    <description></description>
    <executionEnabled>true</executionEnabled>
    <id>3fc55b67-b2a1-4e92-b949-7ed464041993</id>
    <loglevel>INFO</loglevel>
    <name>HelloWorld</name>
    <nodeFilterEditable>false</nodeFilterEditable>
    <plugins />
    <retry delay='5m'>1</retry>
    <schedule>
      <month month='*' />
      <time hour='17' minute='15' seconds='0' />
      <weekday day='*' />
      <year year='*' />
    </schedule>
    <scheduleEnabled>true</scheduleEnabled>
    <sequence keepgoing='false' strategy='node-first'>
      <command>
        <exec>eho "hello world"</exec>
      </command>
    </sequence>
    <uuid>3fc55b67-b2a1-4e92-b949-7ed464041993</uuid>
  </job>
</joblist>

还有“监控”作业,该作业使用 Rundeck API 验证内联脚本上的所有条件:

<joblist>
  <job>
    <defaultTab>nodes</defaultTab>
    <description></description>
    <executionEnabled>true</executionEnabled>
    <id>7a4b117c-75cc-4b49-a504-703e27941170</id>
    <loglevel>INFO</loglevel>
    <name>Monitor</name>
    <nodeFilterEditable>false</nodeFilterEditable>
    <notification>
      <onfailure>
        <email attachLog='true' attachLogInFile='true' recipients='devops@example.net' subject='job failure' />
      </onfailure>
    </notification>
    <notifyAvgDurationThreshold />
    <plugins />
    <schedule>
      <month month='*' />
      <time hour='*' minute='0/15' seconds='0' />
      <weekday day='*' />
      <year year='*' />
    </schedule>
    <scheduleEnabled>true</scheduleEnabled>
    <sequence keepgoing='false' strategy='node-first'>
      <command>
        <description>Monitor the main job</description>
        <fileExtension>.sh</fileExtension>
        <script><![CDATA[# get the latest execution
latest_execution_status=$(curl -s --location --request GET 'http://localhost:4440/api/38/job/3fc55b67-b2a1-4e92-b949-7ed464041993/executions?max=1' \
--header 'Accept: application/json' \
--header 'X-Rundeck-Auth-Token: 6pSjAdcnNInfJ3Y8PHoC6EN7KGzjecEe' \
--header 'Content-Type: application/json' | jq -r '.executions | .[].status')

# now get the latest successful execution date
latest_succeeded_execution_date=$(curl -s --location --request GET 'http://localhost:4440/api/38/job/3fc55b67-b2a1-4e92-b949-7ed464041993/executions?status=succeeded&max=1' \
--header 'Accept: application/json' \
--header 'X-Rundeck-Auth-Token: 6pSjAdcnNInfJ3Y8PHoC6EN7KGzjecEe' \
--header 'Content-Type: application/json' | jq -r '.executions | .[]."date-ended".date' | sed 's/.$//')

# the current date (to compare with the latest execution date)
current_date=$(date --iso-8601=seconds)

# comparing the two dates using dateutils
time_between_latest_succeeded_and_current_time=$(dateutils.ddiff $latest_succeeded_execution_date $current_date -f "%M")

# just for debug
echo "LATEST EXECUTION STATE: $latest_execution_status"
echo "LATEST SUCCEEDED EXECUTION DATE: $latest_succeeded_execution_date"
echo "CURRENT DATE: $current_date"
echo "MINUTES SINCE LATEST SUCCEEDED EXECUTION: $time_between_latest_succeeded_and_current_time minutes"

# If it fails and the last success was more than 1 hour ago, we want it to notify us (via notification plugin)
if [ $latest_execution_status = 'failed' ] && [ $time_between_latest_succeeded_and_current_time -gt 60 ]; then
  # disable main job schedule
  curl -s --location --request POST 'http://localhost:4440/api/38/job/3fc55b67-b2a1-4e92-b949-7ed464041993/schedule/disable' --header 'Accept: application/json' --header 'X-Rundeck-Auth-Token: 6pSjAdcnNInfJ3Y8PHoC6EN7KGzjecEe' --header 'Content-Type: application/json' 

  # with exit 1 this job fails and the notification "on failure" is triggered
  exit 1 
else
    echo "all ok!"
fi

# all done.]]></script>
        <scriptargs />
        <scriptinterpreter>/bin/bash</scriptinterpreter>
      </command>
    </sequence>
    <uuid>7a4b117c-75cc-4b49-a504-703e27941170</uuid>
  </job>
</joblist>

如果您需要单独的脚本,请在此处:

# get the latest execution
latest_execution_status=$(curl -s --location --request GET 'http://localhost:4440/api/38/job/3fc55b67-b2a1-4e92-b949-7ed464041993/executions?max=1' \
--header 'Accept: application/json' \
--header 'X-Rundeck-Auth-Token: 6pSjAdcnNInfJ3Y8PHoC6EN7KGzjecEe' \
--header 'Content-Type: application/json' | jq -r '.executions | .[].status')

# now get the latest successful execution date
latest_succeeded_execution_date=$(curl -s --location --request GET 'http://localhost:4440/api/38/job/3fc55b67-b2a1-4e92-b949-7ed464041993/executions?status=succeeded&max=1' \
--header 'Accept: application/json' \
--header 'X-Rundeck-Auth-Token: 6pSjAdcnNInfJ3Y8PHoC6EN7KGzjecEe' \
--header 'Content-Type: application/json' | jq -r '.executions | .[]."date-ended".date' | sed 's/.$//')

# the current date (to compare with the latest execution date)
current_date=$(date --iso-8601=seconds)

# comparing the two dates using dateutils
time_between_latest_succeeded_and_current_time=$(dateutils.ddiff $latest_succeeded_execution_date $current_date -f "%M")

# just for debug
echo "LATEST EXECUTION STATE: $latest_execution_status"
echo "LATEST SUCCEEDED EXECUTION DATE: $latest_succeeded_execution_date"
echo "CURRENT DATE: $current_date"
echo "MINUTES SINCE LATEST SUCCEEDED EXECUTION: $time_between_latest_succeeded_and_current_time minutes"

# If it fails and the last success was more than 1 hour ago, we want it to notify us (via notification plugin)
if [ $latest_execution_status = 'failed' ] && [ $time_between_latest_succeeded_and_current_time -gt 60 ]; then
  # disable main job schedule
  curl -s --location --request POST 'http://localhost:4440/api/38/job/3fc55b67-b2a1-4e92-b949-7ed464041993/schedule/disable' --header 'Accept: application/json' --header 'X-Rundeck-Auth-Token: 6pSjAdcnNInfJ3Y8PHoC6EN7KGzjecEe' --header 'Content-Type: application/json' 

  # with exit 1 this job fails and the notification "on failure" is triggered
  exit 1 
else
    echo "all ok!"
fi

# all done.

当然,可以使用选项参数化作业 ID、主机等参数。


推荐阅读