首页 > 解决方案 > 检查您的集群 UI 以确保工作人员已注册并拥有足够的资源

问题描述

我创建了一个程序来在 cassandra 上进行文本数据选择。这是我的代码。这只是一个简单的选择所有数据并将其显示在控制台中。

def get_spark_context(app_name, max_cores=120):
    # checkpointDirectory = ""
    conf = SparkConf().setMaster(local_settings.SPARK_MASTER).setAppName(app_name) \
        .set("spark.cores.max", max_cores)\
        .set("spark.jars.packages", "datastax:spark-cassandra-connector:2.0.0-s_2.11") \
        .set("spark.cassandra.connection.host", local_settings.CASSANDRA_MASTER)

    # setup spark context
    sc = SparkContext.getOrCreate(conf=conf)
    sc.setCheckpointDir(local_settings.CHECKPOINT_DIRECTORY)
    return sc

def get_sql_context(sc):
    sqlc = SQLContext.getOrCreate(sc)
    return sqlc

def run():
    sc = get_spark_context("Select data")
    sql_context = get_sql_context(sc)

    sql_context.read.format("org.apache.spark.sql.cassandra") \
        .options(table="text", keyspace="data") \
        .load().show()

但是控制台显示这样。它粘在日志中:

初始工作没有接受任何资源;检查您的集群 UI 以确保工作人员已注册并拥有足够的资源

它永远不会结束。

19/02/21 09:09:22 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/02/21 09:09:23 WARN Utils: Your hostname, osboxes resolves to a loopback address: 127.0.1.1; using 10.0.2.15 instead (on interface eth0)
19/02/21 09:09:23 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
19/02/21 09:09:44 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
19/02/21 09:09:59 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

因此,我检查了我的 spark-worker 日志。错误日志如下

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
19/02/21 08:58:18 INFO CoarseGrainedExecutorBackend: Started daemon with process name: 15264@mm_h01
19/02/21 08:58:18 INFO SignalUtils: Registered signal handler for TERM
19/02/21 08:58:18 INFO SignalUtils: Registered signal handler for HUP
19/02/21 08:58:18 INFO SignalUtils: Registered signal handler for INT
19/02/21 08:58:19 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/02/21 08:58:19 INFO SecurityManager: Changing view acls to: hadoop,osboxes
19/02/21 08:58:19 INFO SecurityManager: Changing modify acls to: hadoop,osboxes
19/02/21 08:58:19 INFO SecurityManager: Changing view acls groups to: 
19/02/21 08:58:19 INFO SecurityManager: Changing modify acls groups to: 
19/02/21 08:58:19 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(hadoop, osboxes); groups with view permissions: Set(); users  with modify permissions: Set(hadoop, osboxes); groups with modify permissions: Set()
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1713)
    at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66)
    at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:188)
    at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:284)
    at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 seconds]. This timeout is controlled by spark.rpc.lookupTimeout
    at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47)
    at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62)
    at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58)
    at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
    at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76)
    at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:100)
    at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:202)
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:67)
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:66)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    ... 4 more
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [120 seconds]
    at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
    at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
    at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:201)
    at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
    ... 11 more
19/02/21 09:00:19 ERROR RpcOutboxMessage: Ask timeout before connecting successfully

这是什么意思?是不是master和worker之间没有交换?非常感谢

标签: apache-sparkpyspark

解决方案


这意味着作业已提交给纱线。但是由于资源不足,它无法启动作业,因为纱线当前无法提供请求的资源。

转到 Ambari/Cloudera UI 查看是否有任何作业正在运行。检查纱线的容器尺寸。检查为作业配置的资源是否超过 yarn/mesos 可用的资源总数


推荐阅读