首页 > 解决方案 > Dataset.map() 抛出 ClassCastException

问题描述

我正在尝试使用 map 函数迭代数据集,将元素返回而不对新变量进行任何更改。然后调用 collect 方法。我得到了类转换异常.ClassCastException。我错过了什么?

def fun() {
   val df = Seq(Person("Max", 33), 
                Person("Adam", 32), 
                Person("Muller", 62)).toDF()

   val encoderPerson = Encoders.product[Person]

   val personDS: Dataset[Person] = df.as[Person](encoderPerson)

   val newPersonDS = personDS.map { iter2 => iter2}

   newPersonDS.collect()
}


case class Person(name: String, age: Int)

java.lang.ClassCastException:com.query.Person 无法在 org.apache.spark.sql.catalyst 的 com.query.MyClass$$anonfun$1.apply(MyClass.scala:42) 上转换为 com.query.Person。 org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.deserializetoobject_doConsume_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.mapelements_doConsume_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$ GeneratedIteratorForCodegenStage3.agg_doAggregateWithKeysOutput_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43 ) 在 org.apache.spark.sql.execution。WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614) 在 org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253) 在 org.apache.spark。 sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.scheduler.ResultTask.runTask( ResultTask.scala:87) 在 org.apache.spark.scheduler.Task.run(Task.scala:109) 在 org.apache.spark.executor.Executor$TaskRunner。在 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 在 java.lang.Thread 的 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 运行(Executor.scala:345) .run(Thread.java:745)

标签: javaapache-spark-sqlclasscastexception

解决方案


推荐阅读