首页 > 解决方案 > 捕获后 AWS Step Functions 失败

问题描述

我的 AWS Step Function 中有 3 个阶段:

  1. 第 1 阶段 - 拉姆达
  2. 第 2 阶段 - AWS 批处理
  3. 第 3 阶段 - AWS Batch(强制清理)

一切正常,如果第 1 阶段失败,则进入清理阶段。但是,由于清理阶段总是通过,因此 Step Function 的最终结果始终是 Pass,而如果 Stage 1 或 2 失败,我需要执行 Cleanup,但 Step Function 最终结果应该是失败。

调查的选项:

  1. 解决这个问题的一种方法是在缓存中维护一个标志是否存在错误,但想知道是否有内置的方法来解决这个问题。
  2. 另一种选择是使用结果路径来检查错误,但我不确定如何从 AWS Batch 访问此结果。

感谢您对此的任何建议,谢谢。

我在第 1 阶段和第 2 阶段添加了以下 Catch 块:

"Catch": [
        {
          "ErrorEquals": [
            "States.ALL"
          ],
          "Next": "Cleanup"
        }
]

清理阶段如下:

"Cleanup": {
  "Type": "Task",
  "Resource": "arn:aws:states:::batch:submitJob.sync",
  "Parameters": {
    "JobDefinition": "arn:aws:batch:<region>:<account>:job-definition/MyCleanupJob",
    "JobName": "cleanup",
    "JobQueue": "arn:aws:batch:<region>:<account>:job-queue/MyCleanupQueue",
    "ContainerOverrides": {
      "Command": [
        "java",
        "-jar",
        "cleanup.jar" ############ need to specify if an error occured as a command line parameter ###########
      ],
    }
  },
  "End": true
}

标签: amazon-web-servicesaws-step-functionsaws-batch

解决方案


在以下机制下使用,感谢@LRutten 引导这条路径。

  1. 对于所有成功阶段,将响应附加到 ResultPath 否则之前的结果将被覆盖。
  2. 将错误设置为异常的响应路径
  3. 使用选项来根据错误元素的存在来决定阶跃函数是否应该失败

这是最终输出:

"MyLambda": {
  "Type": "Task",
  "Resource": "arn:aws:lambda:<region>:<account>:function:MyLambda",
  "ResultPath": "$.mylambda",   #### All results from the lambda are added to "mylambda" in the JSON
  "Catch": [
    {
      "ErrorEquals": [
        "States.ALL"
      ],
      "ResultPath": "$.error",  #### If an error occurs it is appended to the result path as an "error" element
      "Next": "Cleanup"
    }
  ],
  "Next": "MyBatch"
},

"MyBatch": {
  "Type": "Task",
  "Resource": "arn:aws:states:::batch:submitJob.sync",
  "Parameters": {
    "JobDefinition": "arn:aws:batch:<region>:<account>:job-definition/MyBatchJob",
    "JobName": "cleanup",
    "JobQueue": "arn:aws:batch:<region>:<account>:job-queue/MyBatchQueue",
    "ContainerOverrides": {
      "Command": [
        "java",
        "-jar",
        "mybatch.jar"
      ],
    }
  },
  "ResultPath": "$.mybatch",
  "Catch": [
    {
      "ErrorEquals": [
        "States.ALL"
      ],
      "ResultPath": "$.error",
      "Next": "Cleanup"
    }
  ],
  "Next": "Cleanup"
},
"Cleanup": {
  "Type": "Task",
  "ResultPath": "$.cleanup",
  "Resource": "arn:aws:states:::batch:submitJob.sync",
  "Parameters": {
    "JobDefinition": "arn:aws:batch:<region>:<account>:job-definition/MyCleanupJob",
    "JobName": "cleanup",
    "JobQueue": "arn:aws:batch:<region>:<account>:job-queue/MyCleanupQueue",
    "ContainerOverrides": {
      "Command": [
        "java",
        "-jar",
        "cleanup.jar"
      ],
    }
  },
  "Next": "Should Fail"
},
"Should Fail" :{
  "Type" : "Choice",
  "Choices" : [
    {
      "Variable" : "$.error",   #### If an error element is present it means it is a Failure
      "IsPresent": true,
      "Next" : "Fail"
    }
  ],
  "Default" : "Pass"
},
"Fail" : {
  "Type" : "Fail",
  "Cause": "Step function failed"
},
"Pass" : {
  "Type" : "Pass",
  "Result": "Step function passed",
  "End" : true
}
 
}

推荐阅读