hadoop - Mapreduce Program outputs partial files
问题描述
I am running a MapReduce Job that outputs JSON files with only the Mapper phase (no Reducers) into HDFS, processing 100gb input files approx. The job was running fine for the most part, until I could find a few output files partially written. The MapReduce job threw no exceptions.
expected output:
{"id":1,"first_name":"Stephanie","last_name":"Hayesman","email":"shayesman0@behance.net","gender":"Polygender","ip_address":"132.234.151.37"}
{"id":2,"first_name":"Tricia","last_name":"Klaus","email":"tklaus1@acquirethisname.com","gender":"Genderfluid","ip_address":"10.213.69.232"}
{"id":3,"first_name":"Marta","last_name":"Castanares","email":"mcastanares2@dot.gov","gender":"Genderqueer","ip_address":"168.1.204.80"}
{"id":4,"first_name":"Stormie","last_name":"MacCleod","email":"smaccleod3@nsw.gov.au","gender":"Bigender","ip_address":"64.11.123.125"}
{"id":5,"first_name":"Ilyse","last_name":"Gudahy","email":"igudahy4@canalblog.com","gender":"Female","ip_address":"22.146.172.113"}
current output:
{"id":1,"first_name":"Stephanie","last_name":"Hayesman","email":"shayesman0@behance.net","gender":"Polygender","ip_address":"132.234.151.37"}
{"id":2,"first_name":"Tricia","last_name":"Klaus","email":"tklaus1@acquirethisname.com","gender":"Genderfluid","ip_address":"10.213.69.232"}
{"id":3,"first_name":"Marta","last_name":"Castanar',
This kind of chopped output file is seen in a very few cases (6/250) for a 100gb run and occurs with random files if I run the job again with the same input.
Any inputs as to why this might occur is appreciated.
解决方案
推荐阅读
- android - Android - 在 Android Pie (API 28) RadialGradient 绘制一个矩形而不是圆形
- java - 无法按日期过滤表
- angular - 选择 - 设置禁用的选定选项以及绑定 *ngfor 选项
- .net - WinForms PropertyGrid 如何在外部库中查找自定义类型转换器?
- php - 在 OR 查询中失败时确定记录的实际值
- javascript - 具有输入和输出的表单在打印输出后立即重新加载页面
- qt - 单击按钮时如何更改列表视图的 listModel?
- javascript - 单击其中一个子元素后,如何更改完整部分的背景颜色?
- java - 我必须使用扫描仪在数组中移动一个字符。(爪哇)
- vue.js - Vuex:处理动作的最佳方式