首页 > 解决方案 > scala 类的 spark-submit 的反序列化问题

问题描述

我正在尝试处理 scala 和 java 组合项目,我有一个 scala 类,其缩写结构如下

 case class Dl(name:String, length:Int) extends Serializable 

 class DlStruct private(xs:List[Dl]) extends Serializable {
    def this()= this(Nil)

    private def +=(dl:DataLayout): RowSchema =
      new RowSchema(xs :+ dl)

    def appendDl(fieldName:String, fieldLength:Int):DlStruct=
      this += Dl(fieldName,fieldLength)

 }

上面的类是从 java 对象调用来填充 DlStruct 的,完成后,我将类文件作为序列化文件写出来。

当我再次反序列化文件并将其转换回对象时,当我使用 IntelliJ 工作时它工作得非常好,但是如果我尝试从 spark-submit 运行相同的代码,则会引发以下错误:-

java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field somepackage.DlStruct.xs of type scala.collection.immutable.List in instance of somepackage.DlStruct.xs
at java.base/java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2205)
at java.base/java.io.ObjectStreamClass$FieldReflector.checkObjectFieldValueTypes(ObjectStreamClass.java:2168)
at java.base/java.io.ObjectStreamClass.checkObjFieldValueTypes(ObjectStreamClass.java:1422)
at java.base/java.io.ObjectInputStream.defaultCheckFieldValues(ObjectInputStream.java:2450)
at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2357)
at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2166)
at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1668)
at java.base/java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2434)
at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2328)
at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2166)
at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1668)
at java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:482)
at java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:440)

普通java对象的反序列化也没有问题

反序列化的代码片段

 File file = new File(serializedFilePath);
 FileInputStream fin = new FileInputStream(file);
 ObjectInputStream in = new ObjectInputStream(fin);

 infoHolder = (ObjectCarrier) in.readObject(); // <- this line gives error if it has scala object, else runs smoothly

 in.close();
 fileIn.close();

Spark 版本- 2.4.4 Scala 版本- 2.12.8 Java- 1.8

标签: javascalaspark-submit

解决方案


我必须将 scala 类转换为 java 才能让它最终开始在 spark-submit 中工作,我希望有人能找到更好的答案。


推荐阅读