scala - 创建数据框时出现scala空点异常
问题描述
我正在尝试从某个位置读取文件并将其加载到 spark 数据框中。下面的代码可以正常工作:
val tempDF:DataFrame=spark.read.orc(targetDirectory)
当我尝试提供相同的架构时,代码失败并出现以下问题:
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, brdn6136.target.com, executor 25): java.lang.NullPointerException
at org.apache.spark.sql.execution.datasources.orc.OrcColumnVector.getDouble(OrcColumnVector.java:152)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
下面是我使用的代码:
val schema = StructType(
List(
StructField("Col1",DoubleType,true),
StructField("Col2",StringType,true),
StructField("Col3",DoubleType,true),
StructField("Col4",DoubleType,true),
StructField("Col5",DoubleType,true),
StructField("Col6",StringType,true),
StructField("Col7",StringType,true),
StructField("Col8",StringType,true),
StructField("Col9",StringType,true),
StructField("Col10",StringType,true),
StructField("Col11",StringType,true),
StructField("Col12",StringType,true)
)
)
val df:DataFrame=spark.read.format("orc")
.schema(schema)
.load(targetReadDirectory)
谁能帮我解决这个问题?
解决方案
推荐阅读
- python - Python 正则表达式失败?>
- angular - 如何使用 Swagger Codegen 配置端口
- android - 在android中更新应用程序后如何获取以前的apk版本代码?
- javascript - 运行 react-native 应用程序时出现警告
- java - 从文件写入数据库
- python - Python如何按字典顺序枚举k-mers
- reactjs - 尝试更改状态时做出反应的未知错误
- asp.net - 即使在传递数据之后,JsonElement 参数的 valuekind 也未定义?
- javascript - 无法读取未定义反应错误的属性“长度”
- http - 服务器如何调用端点 https 或 http