首页 > 解决方案 > 尽管资源分配,纱线应用程序已接受但未运行 cloudera

问题描述

我正在使用 Cloudera 快速入门 VM 5.13.0.0 在纱线客户端模式下运行 Spark 应用程序。我为我的 Cloudera VM 分配了 10GB 和 3 个内核。当我提交申请时,申请已被接受,但从未进入 RUNNING。当我尝试使用yarn logs -applicationId查找日志时, 我什么也没看到。它绝对是空白的。

我已经在以下位置查找了这个问题:

我实际上已经干预了这些链接认为有问题的所有配置。对于我的问题,我仍然没有答案,从表面上看,它看起来像上面链接中的问题。以下是我的 cloudera 集群的配置参数:

mapreduce.map.memory.mb 128M
mapreduce.reduce.memory.mb 128M
mapreduce.job.heap.memory-mb.ratio 0.8
yarn.nodemanager.resource.memory-mb 1900M
yarn.nodemanager.resource.percentage-physical-cpu-limit 100
yarn.nodemanager.resource.cpu-vcores 1
yarn.scheduler.minimum-allocation-mb 1M
yarn.scheduler.increment-allocation-mb 100M
yarn.scheduler.maximum-allocation-mb 1600M
yarn.scheduler.minimum-allocation-vcores 1
yarn.scheduler.increment-allocation-vcores 1
yarn.scheduler.maximum-allocation-vcores 2
yarn.scheduler.fair.continuous-scheduling-enabled unchecked
mapreduce.am.max-attempts 1
yarn.resourcemanager.am.max-retries, yarn.resourcemanager.am.max-attempts 1
yarn.app.mapreduce.am.resource.mb 1G
yarn.app.mapreduce.am.resource.cpu-vcores 1
ApplicationMaster Java Maximum Heap Size 512M
yarn.resourcemanager.scheduler.class org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler
yarn.scheduler.fair.user-as-default-queue unchecked
yarn.scheduler.fair.preemption unchecked
yarn.scheduler.fair.preemption.cluster-utilization-threshold 0.8
yarn.scheduler.fair.sizebasedweight unchecked
Fair Scheduler Allocations (deployed) {"defaultFairSharePreemptionThreshold":null,"defaultFairSharePreemptionTimeout":null,"defaultMinSharePreemptionTimeout":null,"defaultQueueSchedulingPolicy":"drf","queueMaxAMShareDefault":-1.0,"queueMaxAppsDefault":null,"queuePlacementRules":[{"create":true,"name":"specified","queue":null,"rules":null},{"create":null,"name":"nestedUserQueue","queue":null,"rules":[{"create":true,"name":"default","queue":"users","rules":null}]},{"create":null,"name":"default","queue":null,"rules":null}],"queues":[{"aclAdministerApps":null,"aclSubmitApps":null,"allowPreemptionFrom":null,"fairSharePreemptionThreshold":null,"fairSharePreemptionTimeout":null,"minSharePreemptionTimeout":null,"name":"root","queues":[{"aclAdministerApps":null,"aclSubmitApps":null,"allowPreemptionFrom":null,"fairSharePreemptionThreshold":null,"fairSharePreemptionTimeout":null,"minSharePreemptionTimeout":null,"name":"default","queues":[],"schedulablePropertiesList":[{"impalaDefaultQueryMemLimit":null,"impalaDefaultQueryOptions":null,"impalaMaxMemory":null,"impalaMaxQueuedQueries":null,"impalaMaxRunningQueries":null,"impalaQueueTimeout":null,"maxAMShare":-1.0,"maxChildResources":null,"maxResources":null,"maxRunningApps":null,"minResources":null,"scheduleName":"default","weight":1.0}],"schedulingPolicy":"drf","type":null},{"aclAdministerApps":null,"aclSubmitApps":null,"allowPreemptionFrom":null,"fairSharePreemptionThreshold":null,"fairSharePreemptionTimeout":null,"minSharePreemptionTimeout":null,"name":"users","queues":[],"schedulablePropertiesList":[{"impalaDefaultQueryMemLimit":null,"impalaDefaultQueryOptions":null,"impalaMaxMemory":null,"impalaMaxQueuedQueries":null,"impalaMaxRunningQueries":null,"impalaQueueTimeout":null,"maxAMShare":-1.0,"maxChildResources":null,"maxResources":null,"maxRunningApps":null,"minResources":null,"scheduleName":"default","weight":1.0}],"schedulingPolicy":"drf","type":"parent"}],"schedulablePropertiesList":[{"impalaDefaultQueryMemLimit":null,"impalaDefaultQueryOptions":null,"impalaMaxMemory":null,"impalaMaxQueuedQueries":null,"impalaMaxRunningQueries":null,"impalaQueueTimeout":null,"maxAMShare":null,"maxChildResources":null,"maxResources":null,"maxRunningApps":null,"minResources":null,"scheduleName":"default","weight":1.0}],"schedulingPolicy":"drf","type":null}],"userMaxAppsDefault":1,"users":[]}

以下是应用程序仍处于 ACCEPTED 状态时的队列描述: 在此处输入图像描述

同样,这是来自 Yarn RM UI 的记录(请注意,资源已分配(内存/cpu)并且正在运行的容器显示 1 个容器正在运行): 在此处输入图像描述

这是应用程序摘要:

在此处输入图像描述

以下是应用程序日志(空): 在此处输入图像描述

最后,这是司机看到的:

enter code here19/12/26 00:16:42 INFO Client: 
 client token: N/A
 diagnostics: Application application_1577297544619_0002 failed 1 times due to AM Container for appattempt_1577297544619_0002_000001 exited with  exitCode: 10
 For more detailed output, check application tracking page:http://quickstart.cloudera:8088/proxy/application_1577297544619_0002/Then, click on links to logs of each attempt.
 Diagnostics: Exception from container-launch.
 Container id: container_1577297544619_0002_01_000001
 Exit code: 10
 Stack trace: ExitCodeException exitCode=10: 
at org.apache.hadoop.util.Shell.runCommand(Shell.java:604)
at org.apache.hadoop.util.Shell.run(Shell.java:507)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:789)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)


Container exited with a non-zero exit code 10
Failing this attempt. Failing the application.
 ApplicationMaster host: N/A
 ApplicationMaster RPC port: -1
 queue: root.default
 start time: 1577299469533
 final status: FAILED
 tracking URL: http://quickstart.cloudera:8088/cluster/app/application_1577297544619_0002
 user: shepanch
19/12/26 00:16:42 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:85)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:165)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:512)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2511)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:909)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:901)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:901)
at cloudera.jobs.ClouderaSampleJob$.delayedEndpoint$cloudera$jobs$ClouderaSampleJob$1(ClouderaSampleJob.scala:17)
at cloudera.jobs.ClouderaSampleJob$delayedInit$body.apply(ClouderaSampleJob.scala:6)
at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.App$class.main(App.scala:76)
at cloudera.jobs.ClouderaSampleJob$.main(ClouderaSampleJob.scala:6)
at cloudera.jobs.ClouderaSampleJob.main(ClouderaSampleJob.scala)

有什么办法可以解决这个问题吗?

标签: hadoop-yarnclouderacloudera-cdhcloudera-managercloudera-quickstart-vm

解决方案


经过所有研究,除了我在问题中提到的链接中提到的原因之外,我发现这可能是由于各种原因而发生的:

  1. 当您在客户端(驱动程序)和集群中有不同版本的 spark 时。一旦确保两者捆绑了相同版本的 spark,它就可以正常运行。
  2. 您可能需要提及属性 spark.driver.host。确保此处传入的 IP 可以从来宾 VM ping 通。

推荐阅读