首页 > 技术文章 > hadoop yarn 实战错误汇总

prisoner 2015-09-04 20:24 原文

1.hadoop yarn 运行wordcount时执行完成,但是返回错误

错误信息如下:

15/09/05 03:48:02 INFO mapreduce.Job: Job job_1441395011668_0001 failed with state FAILED due to: Application application_1441395011668_0001 failed 2 times due to AM Container for appattempt_1441395011668_0001_000002 exited with  exitCode: 1
For more detailed output, check application tracking page:http://macmaster.hadoop:8088/proxy/application_1441395011668_0001/AThen, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1441395011668_0001_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1: 
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
        at org.apache.hadoop.util.Shell.run(Shell.java:455)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
        at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)


Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
15/09/05 03:48:02 INFO mapreduce.Job: Counters: 0

有可能是mapreduce.jobhistory.address没有配置,因为yarn要读取jobhistory信息来获取是否执行成功,可以修改yarn-site.xml如下:

<configuration>
   <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>macmaster.hadoop:10020</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>macmaster.hadoop:19888</value>
    </property>
</configuration>

 

2.60000 millis timeout while waiting for channel to be ready for read. ch

有可能是读写等待超时引起的错误,我是执行randomtextwriter和randomwriter时引起的,由于CPU和内存性能较差,并且计算数据量较大,引起了读取hdfs时很慢导致超时,可以添加hdfs-site.xml如下:

<property>
        <name>dfs.datanode.socket.write.timeout</name>
        <value>600000</value> #其中默认为60000
    </property>
    <property>
        <name>dfs.socket.timeout</name>
        <value>600000</value>  #其中默认为60000
</property>

 

推荐阅读