首页 > 解决方案 > 嵌套 Java 类的 No Encoder found 错误

问题描述

我创建了一个 Scala 类,如下所示:

case class MyObjectWithEventTime(value: MyObject, eventTime: Timestamp)

MyObject 是一个 Java 对象。

我试图在我的 Spark Structured Streaming 作业中按如下方式使用它:

implicit val myObjectEncoder: Encoder[MyObject] = Encoders.bean(classOf[MyObject])

val withEventTime = mystream
 .select(from_json(col("value").cast("string"), schema).alias("value"))
 .withColumn("eventTime", to_timestamp(col("value.timeArrived")))
 .as[MyObjectWithEventTime]
 .groupByKey(row => {... some code here
 })
 .mapGroupsWithState(GroupStateTimeout.ProcessingTimeTimeout())(updateAcrossEvents)
 .filter(col("id").isNotNull)
 .toJSON
 .writeStream
 .format("kafka")
 .option("checkpointLocation", "/tmp")
 .option("kafka.bootstrap.servers", "localhost:9092")
 .option("topic", conf.KafkaProperties.outputTopic)
 .option("checkpointLocation", "/tmo/checkpointLocation")
 .outputMode("update")
 .start()
 .awaitTermination()

但我不断收到这个错误......

Exception in thread "main" java.lang.UnsupportedOperationException: No Encoder found for com.xxx.MyObject
- field (class: "com.xxx.MyObject", name: "value")
- root class: "com.xxx.MyObjectWithEventTime"

标签: javascalaapache-sparkapache-spark-sqlspark-structured-streaming

解决方案


尝试为MyObjectWithEventTime和使用Encoders.javaSerialization[T]方法定义编码器:

implicit val myObjectEncoder: Encoder[MyObject] = Encoders.javaSerialization[MyObject]
implicit val myObjectWithEventEncoder: Encoder[MyObjectWithEventTime] = Encoders.javaSerialization[MyObjectWithEventTime]

请记住,您的 java 类MyObject应该实现 Serializable 并为所有字段实现公共 getter 和 setter。


推荐阅读