apache-spark - Spark2-submit 失败,pyspark
问题描述
我正在将应用程序从 spark 1.6 升级到 Spark 2,但使用 pyspark 的 Spark2-submit 在 Cloudera 环境中失败。
为此,我刚刚从 spark-submit 更新了 spark2-submit,但无法创建 Spark Context 并给出以下错误。看起来 Spark 2 配置缺少一些属性,该属性不允许它识别存储 python 文件的暂存位置。
sc = SparkContext(conf=conf)
File "/apps/cloudera/parcels/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904/lib/spark2/python/lib/pyspark.zip/pyspark/context.py", line 118, in __init__
File "/apps/cloudera/parcels/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904/lib/spark2/python/lib/pyspark.zip/pyspark/context.py", line 182, in _do_init
File "/apps/cloudera/parcels/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904/lib/spark2/python/lib/pyspark.zip/pyspark/context.py", line 249, in _initialize_context
File "/apps/cloudera/parcels/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904/lib/spark2/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1401, in __call__
File "/apps/cloudera/parcels/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904/lib/spark2/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.IllegalArgumentException: Can not create a Path from an empty string
at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
at org.apache.hadoop.fs.Path.<init>(Path.java:135)
at org.apache.hadoop.fs.Path.<init>(Path.java:94)
at org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:368)
at org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:481)
at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$13.apply(Client.scala:629)
at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$13.apply(Client.scala:627)
at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:74)
at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:627)
at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:874)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:171)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:171)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:236)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:745)
解决方案
如日志中所述..
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
您的 spark-submit 二进制文件无法找到 spark 主页。
- 要么您必须 su 到特定用户,例如:在 emr 中,您必须是 hadoop,它将自动为您设置
- 您可以
SPARK_HOME
在每次调用 spark-submit 之前设置。
export SPARK_HOME=/usr/lib/spark/
推荐阅读
- node.js - 我应该将用户信息放入 JWT 令牌中,还是只输入用户的 ID,然后从数据库中选择信息?
- c# - 我应该在 Visual Studio 内的 wwwroot 下创建用于文件上传的文件夹吗?
- python - Python中的递归对象属性创建
- vector - C++向量理解
- spring-integration - Spring云数据流Httpclient
- javascript - 带有 REST Post 请求的 415(不支持的媒体类型)
- python - Python中的矩阵可以表示架子上的项目
- powershell - Out-File 只打印我一半的结果
- spring-kafka - 处理存储库和 Kafka 事务
- node.js - 节点 - 无效的 select() 参数。必须是字符串或对象