首页 > 解决方案 > 在 hive 中加入 2 个大表时顶点失败

问题描述

我在蜂巢中有 2 张桌子。表 A 有 300M 行,表 B 有 26M 行。我在 3 列 col1、col2、col3 上加入表 A 和表 B。

以下是我正在使用的查询

创建临时表 AB_TEMP AS select A.col1,A.col2,A.col3,A.col4,A.col5 from A join B on A.col1=B.col1 and A.col2=B.col2 and A.col3= B.col3;

每次运行此查询时,我都会收到一个名为 vertex failure 的错误。

怎么做才能克服这个问题?

下面是我得到的错误

Status: Failed Vertex failed, vertexName=Map 1, vertexId=vertex_1617665530644_1398582_10_01, diagnostics=[Task failed, taskId=task_1617665530644_1398582_10_01_000147, diagnostics=[TaskAttempt 0 failed, info=[AttemptID:attempt_1617665530644_1398582_10_01_000147_0 Timed out after 300 secs], TaskAttempt 1 failed, info =[AttemptID:attempt_1617665530644_1398582_10_01_000147_1 Timed out after 300 secs], TaskAttempt 2 failed, info=[AttemptID:attempt_1617665530644_1398582_10_01_000147_2 Timed out after 300 secs], TaskAttempt 3 failed, info=[Container container_e42_1617665530644_1398582_01_002060 timed out]], Vertex did not succeed due to OWN_TASK_FAILURE , failedTasks:1killedTasks:220, Vertex vertex_1617665530644_1398582_10_01 [Map 1] 由于:OWN_TASK_FAILURE 而杀死/失败] DAG 由于 VERTEX_FAILURE 而没有成功。failedVertices:1 killVertices:0 失败:执行错误,从 org.apache.hadoop.hive.ql.exec.tez.TezTask 返回代码 2。Vertex failed, vertexName=Map 1, vertexId=vertex_1617665530644_1398582_10_01, diagnostics=[Task failed, taskId=task_1617665530644_1398582_10_01_000147, diagnostics=[TaskAttempt 0 failed, info=[AttemptID:attempt_1617665530644_1398582_10_01_000147_0 Timed out after 300 secs], TaskAttempt 1 failed, info=[AttemptID :attempt_1617665530644_1398582_10_01_000147_1 Timed out after 300 secs], TaskAttempt 2 failed, info=[AttemptID:attempt_1617665530644_1398582_10_01_000147_2 Timed out after 300 secs], TaskAttempt 3 failed, info=[Container container_e42_1617665530644_1398582_01_002060 timed out]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks: 1 killTask​​s:220, Vertex vertex_1617665530644_1398582_10_01 [地图 1] 因以下原因被杀死/失败:OWN_TASK_FAILURE]由于 VERTEX_FAILURE,DAG 未成功。失败顶点:1 杀死顶点:0

标签: joinhivehiveql

解决方案


不要在 tez 上执行此查询。我们可以在 Map Reduce 中完成这个。

set hive.execution.engine=mr;
set hive.auto.convert.join=false;
set mapreduce.map.memory.mb=2048;
set mapreduce.reduce.memory.mb=4096;

设置好上述所有参数后,您可以运行代码,它执行正常


推荐阅读