首页 > 解决方案 > Mapreduce Program outputs partial files

问题描述

I am running a MapReduce Job that outputs JSON files with only the Mapper phase (no Reducers) into HDFS, processing 100gb input files approx. The job was running fine for the most part, until I could find a few output files partially written. The MapReduce job threw no exceptions.

expected output:
{"id":1,"first_name":"Stephanie","last_name":"Hayesman","email":"shayesman0@behance.net","gender":"Polygender","ip_address":"132.234.151.37"}
{"id":2,"first_name":"Tricia","last_name":"Klaus","email":"tklaus1@acquirethisname.com","gender":"Genderfluid","ip_address":"10.213.69.232"}
{"id":3,"first_name":"Marta","last_name":"Castanares","email":"mcastanares2@dot.gov","gender":"Genderqueer","ip_address":"168.1.204.80"}
{"id":4,"first_name":"Stormie","last_name":"MacCleod","email":"smaccleod3@nsw.gov.au","gender":"Bigender","ip_address":"64.11.123.125"}
{"id":5,"first_name":"Ilyse","last_name":"Gudahy","email":"igudahy4@canalblog.com","gender":"Female","ip_address":"22.146.172.113"}
current output:
{"id":1,"first_name":"Stephanie","last_name":"Hayesman","email":"shayesman0@behance.net","gender":"Polygender","ip_address":"132.234.151.37"}
{"id":2,"first_name":"Tricia","last_name":"Klaus","email":"tklaus1@acquirethisname.com","gender":"Genderfluid","ip_address":"10.213.69.232"}
{"id":3,"first_name":"Marta","last_name":"Castanar',

This kind of chopped output file is seen in a very few cases (6/250) for a 100gb run and occurs with random files if I run the job again with the same input.

Any inputs as to why this might occur is appreciated.

标签: hadoopmapreducehdfsclouderahdp

解决方案


推荐阅读