首页 > 解决方案 > Apache Spark 驱动程序日志未指定阶段取消的原因

问题描述

我在 YARN 下的 AWS EMR 上运行 Apache Spark。集群有 1 个 master 和 10 个 executor。经过几个小时的处理,我的集群失败了,我去查看日志。

所以,我看到所有工作的执行者都试图一次杀死任务(这是某人执行者的日志):

20/03/05 00:02:12 INFO Executor: Executor is trying to kill task 66.0 in stage 2.0 (TID 466), reason: Stage cancelled
20/03/05 00:02:12 INFO Executor: Executor is trying to kill task 65.0 in stage 2.0 (TID 465), reason: Stage cancelled
20/03/05 00:02:12 INFO Executor: Executor is trying to kill task 67.0 in stage 2.0 (TID 467), reason: Stage cancelled
20/03/05 00:02:12 INFO Executor: Executor is trying to kill task 64.0 in stage 2.0 (TID 464), reason: Stage cancelled
20/03/05 00:02:12 ERROR Utils: Aborting a task

我知道原因是,Stage cancelled但我无法获得任何详细信息。我查看了驱动程序日志,发现它们的最后一条记录要早得多。

所以我有两个问题:

20/03/04 18:39:40 INFO TaskSetManager: Starting task 159.0 in stage 1.0 (TID 359, ip-172-31-6-236.us-west-2.compute.internal, executor 40, partition 159, RACK_LOCAL, 8421 bytes)
20/03/04 18:39:40 INFO ExecutorAllocationManager: New executor 40 has registered (new total is 40)
20/03/04 18:39:41 INFO BlockManagerMasterEndpoint: Registering block manager ip-172-31-6-236.us-west-2.compute.internal:33589 with 2.8 GB RAM, BlockManagerId(40, ip-172-31-6-236.us-west-2.compute.internal, 33589, None)
20/03/04 18:39:42 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on ip-172-31-6-236.us-west-2.compute.internal:33589 (size: 44.7 KB, free: 2.8 GB)
20/03/04 18:39:48 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on ip-172-31-6-236.us-west-2.compute.internal:33589 (size: 37.4 KB, free: 2.8 GB)

标签: apache-sparkhadoop-yarnamazon-emr

解决方案


推荐阅读