首页 > 解决方案 > AWS Glue 作业失败并出现错误“错误客户端:应用程序诊断消息:用户应用程序以状态 1 退出”

问题描述

我最近正在使用AWS Glue作业来测试运行一些spark python代码,我昨天开始运行并且它成功了,今天早上没有任何变化,我开始了3次都失败了。日志很奇怪,我不明白......:

这是从错误日志中复制的:

kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec]
awk: /tmp/parse_yarn_logs.awk:6: warning: escape sequence `\[' treated as plain `['
awk: /tmp/parse_yarn_logs.awk:6: warning: escape sequence `\]' treated as plain `]'
awk: /tmp/parse_yarn_logs.awk:8: warning: escape sequence `\(' treated as plain `('
awk: /tmp/parse_yarn_logs.awk:8: warning: escape sequence `\)' treated as plain `)'
21/03/04 09:56:42 INFO client.RMProxy: Connecting to ResourceManager at ip-xxxxxx.ec2.internal/xxx.xx.xx.x:xxxx
awk: /tmp/parse_yarn_logs.awk:19: (FILENAME=- FNR=1) fatal: Unmatched ( or \(: /.*Unregistering ApplicationMaster with FAILED (diag message: Shutdown hook called before final status was reported.*$/

通过查看完整版本的日志,我发现这一点似乎导致了问题:

21/03/04 10:12:08 ERROR Client: Application diagnostics message: User application exited with status 1
Exception in thread "main" org.apache.spark.SparkException: Application application_xxxxxxxx_0001 finished with failed status
    at org.apache.spark.deploy.yarn.Client.run(Client.scala:1149)
    at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1526)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
21/03/04 10:12:08 INFO ShutdownHookManager: Shutdown hook called
21/03/04 10:12:08 INFO ShutdownHookManager: Deleting directory /tmp/spark-xxxxxxxxxx
21/03/04 10:12:08 INFO ShutdownHookManager: Deleting directory /tmp/spark-xxxxxxxxxx

其中一个运行启动时间使用了 10 分钟?!通常它只用了几秒钟......似乎胶水不是很稳定......并且工作失败或不是基于我的运气......

有谁知道是什么导致了这个问题,我能做些什么来提高它的性能吗?谢谢。

标签: amazon-web-servicesapache-sparkawkpysparkaws-glue

解决方案


我现在在 AWS Glue Job 上也是如此。但是在我这边,当我在代码中添加一个新行时就会发生这种情况

device = DeviceDetector('Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.77.34.5 Safari/537.36 QJY/2.0 Philco_PTV24G50SN-VB_DRM HDR DID/C0132bb2240f').parse() 

当我关闭这条线时,工作就可以了。因为它是我们代码中的新 Python 包(我刚刚添加了它),所以我不知道它是以前的样子。希望有人能解释一下。


推荐阅读