hadoop - 对空文件进行随机播放失败。EOFException:输入流意外结束
问题描述
我正在尝试运行数据处理管道的副本,该管道在集群上正确运行,在本地机器上运行,hadoop 和 hbase 在独立模式下工作。管道包含几个 mapreduce 作业一个接一个地开始,其中一个作业的 mapper 不会在输出中写入任何内容(取决于输入,但它不会在我的测试中写入任何内容),但有 reducer。我在此作业运行期间收到此异常:
16:42:19,322 [INFO] [localfetcher#13] o.a.h.i.c.CodecPool: Got brand-new decompressor [.gz]
16:42:19,322 [INFO] [localfetcher#13] o.a.h.m.t.r.LocalFetcher: localfetcher#13 about to shuffle output of map attempt_local509755465_0013_m_000000_0 decomp: 2 len: 6 to MEMORY
16:42:19,326 [WARN] [Thread-4749] o.a.h.m.LocalJobRunner: job_local509755465_0013 java.lang.Exception: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in localfetcher#13
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) ~[hadoop-mapreduce-client-common-2.5.1.jar:?]
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529) [hadoop-mapreduce-client-common-2.5.1.jar:?]
Caused by: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in localfetcher#13
at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) ~[hadoop-mapreduce-client-core-2.7.3.jar:?]
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) ~[hadoop-mapreduce-client-core-2.7.3.jar:?]
at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) ~[hadoop-mapreduce-client-common-2.5.1.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_181]
at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266) ~[?:1.8.0_181]
at java.util.concurrent.FutureTask.run(FutureTask.java) ~[?:1.8.0_181]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_181]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_181]
at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_181]
Caused by: java.io.EOFException: Unexpected end of input stream
at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:145) ~[hadoop-common-2.7.3.jar:?]
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85) ~[hadoop-common-2.7.3.jar:?]
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:199) ~[hadoop-common-2.7.3.jar:?]
at org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.shuffle(InMemoryMapOutput.java:97) ~[hadoop-mapreduce-client-core-2.7.3.jar:?]
at org.apache.hadoop.mapreduce.task.reduce.LocalFetcher.copyMapOutput(LocalFetcher.java:157) ~[hadoop-mapreduce-client-core-2.7.3.jar:?]
at org.apache.hadoop.mapreduce.task.reduce.LocalFetcher.doCopy(LocalFetcher.java:102) ~[hadoop-mapreduce-client-core-2.7.3.jar:?]
at org.apache.hadoop.mapreduce.task.reduce.LocalFetcher.run(LocalFetcher.java:85) ~[hadoop-mapreduce-client-core-2.7.3.jar:?]
我检查了 mapper 生成的文件,我预计它们会是空的,因为 mapper 没有写任何东西来存储,但它们包含奇怪的文本:
文件:/tmp/hadoop-egorkiruhin/mapred/local/localRunner/egorkiruhin/jobcache/job_local509755465_0013/attempt_local509755465_0013_m_000000_0/output/file.out
ÿÿÿÿ^@^@
文件:/tmp/hadoop-egorkiruhin/mapred/local/localRunner/egorkiruhin/jobcache/job_local509755465_0013/attempt_local509755465_0013_m_000000_0/output/file.out.index
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^B^@^@^@^@^@^@^@^F^@ ^@^@^@dTG<93>
解决方案
我找不到这个问题的解释,但我通过关闭映射器输出的压缩来解决它:
config.set("mapreduce.map.output.compress", "false");
推荐阅读
- python - 如何在databricks(scala,python等)中创建.tsv文件
- mysql - MySQL Join 使查询变慢 - 不知道为什么
- node.js - 使用 async/await 语法时节点抛出错误。但它适用于导入/导出语法
- java - 在 Spring Boot 应用程序中组织服务、服务实现和存储库的最佳实践
- laravel - Laravel 无法更新或删除外键约束失败的父行
- java - JPA-UnsatisfiedDependencyException:调用 init 方法失败
- python - ValueError:检查目标时出错:预期的预测具有形状(4,)但得到的数组具有形状(1,)
- java - 没有春天的蔚蓝
- selenium - 选择兄弟元素的子元素
- javascript - 如何从动态创建的元素中获取元素类名?