java - java.lang.OutOfMemoryError: Java heap space Error while running hive table scan
问题描述
我们正在尝试读取一个大的配置单元表,该表是 RCFile 格式的。它正在一一处理表的所有分区。到目前为止,map 任务似乎已经编写了 30 个溢出文件。
在某一时刻,它开始处理一个文件,下面是日志,
2018-06-15 00:54:28,977 INFO org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file hdfs://xxx:8020/data/csv/7342/2018-06-14/17/1/Network_xxx.dat
2018-06-15 00:54:29,005 INFO org.apache.hadoop.hive.ql.exec.MapOperator: Processing alias ntwk for file hdfs://xxx:8020/data/csv/7342/2018-06-14/17/1
2018-06-15 00:55:04,029 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 7 finished. closing...
2018-06-15 00:55:04,129 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 7 forwarded 6672342 rows
2018-06-15 00:55:04,266 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 3 finished. closing...
2018-06-15 00:55:04,266 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 3 forwarded 0 rows
2018-06-15 00:55:04,513 INFO org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: 2 finished. closing...
2018-06-15 00:55:04,538 INFO org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: 2 forwarded 0 rows
2018-06-15 00:55:04,563 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 3 Close done
2018-06-15 00:55:04,589 INFO org.apache.hadoop.hive.ql.exec.MapOperator: DESERIALIZE_ERRORS:0
2018-06-15 00:55:04,616 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 1 finished. closing...
2018-06-15 00:55:04,641 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 1 forwarded 6672342 rows
2018-06-15 00:55:04,666 INFO org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: 0 finished. closing...
2018-06-15 00:55:04,691 INFO org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: 0 forwarded 0 rows
2018-06-15 00:55:04,716 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 1 Close done
2018-06-15 00:55:04,741 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 7 Close done
2018-06-15 00:55:04,792 INFO ExecMapper: ExecMapper: processed 6672342 rows: used memory = 412446808
2018-06-15 00:55:10,316 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2018-06-15 00:55:10,852 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.io.compress.DecompressorStream.<init>(DecompressorStream.java:50)
at org.apache.hadoop.io.compress.BlockDecompressorStream.<init>(BlockDecompressorStream.java:50)
at org.apache.hadoop.io.compress.SnappyCodec.createInputStream(SnappyCodec.java:173)
at org.apache.hadoop.hive.ql.io.RCFile$Reader.nextKeyBuffer(RCFile.java:1447)
at org.apache.hadoop.hive.ql.io.RCFile$Reader.next(RCFile.java:1602)
at org.apache.hadoop.hive.ql.io.RCFileRecordReader.next(RCFileRecordReader.java:98)
at org.apache.hadoop.hive.ql.io.RCFileRecordReader.next(RCFileRecordReader.java:85)
at org.apache.hadoop.hive.ql.io.RCFileRecordReader.next(RCFileRecordReader.java:39)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:274)
at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101)
at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:108)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:329)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:247)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:215)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:200)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
它是如何处理其他文件的,以及为什么此时它会遇到 OutOfMemory 错误。内部会发生什么?
映射任务中的堆空间和映射缓冲区有什么区别?
以下是一些配置:
mapred.map.child.java.opts : -Xmx512M
mapred.job.reduce.memory.mb : -1
mapred.job.map.memory.mb : -1
Hive map 任务也开始打印统计信息,它已处理的行数和已使用的内存。使用的内存似乎远低于配置的内存。
为什么打印统计信息 6 秒后,它已注册 OutOfMemory 错误。我无法理解出了什么问题。
**2018-06-15 00:55:04,741 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 7 Close done
2018-06-15 00:55:04,792 **INFO ExecMapper: ExecMapper: processed 6672342 rows: used memory = 412446808**
2018-06-15 00:55:10,316 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2018-06-15 00:55:10,852 **FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError: Java heap space**
at org.apache.hadoop.io.compress.DecompressorStream.<init>(DecompressorStream.java:50)
at org.apache.hadoop.io.compress.BlockDecompressorStream.<init>(BlockDecompressorStream.java:50)
at org.apache.hadoop.io.compress.SnappyCodec.createInputStream(SnappyCodec.java:173)
at org.apache.hadoop.hive.ql.io.RCFile$Reader.nextKeyBuffer(RCFile.java:1447)**
解决方案
您可以尝试在运行时增加 reducer 的内存限制。详情请参考博客:https ://dataanalyticstrend.blogspot.com/2020/04/what-is-hive-and-how-to-solve-hive-heap.html
推荐阅读
- scala - 什么时候逆变有用?为什么使用协方差与继承?
- asp.net-mvc - 为什么 IIS 会针对 403 错误显示它自己的错误消息,而不是依赖于我的 customErrors 元素?
- python - AttributeError:“应用程序”对象没有属性“create_widgets”
- python - 如何在 Python 中的分类器中显示预测的置信度(Tensorflow 2.2/Keras)
- tensorflow - tensorflow 2.x 分布式集群
- android-room - 如何在 WorkManager 中获取 Room Database ViewModel 实例?
- javascript - 使用连字符格式化 yaml 数据以写入文件打字稿
- sql-server - 如何在 Access VBA 中捕获缺少访问错误代码的特定 SQL Server 错误?
- thymeleaf - Thymeleaf 使用箭头键
- node.js - 使用 CTE 插入多个表不适合我