spark-streaming - 火花流中的 JsonDecoder 解析失败
问题描述
我正在尝试解码作为我的 spark2.2 流中的 avro 消息的一部分出现的消息。我为此 json 定义了一个模式,并且每当 json 消息出现不尊重 json 模式时,我的 JsonDecoder 就会失败并出现以下错误
Caused by: org.apache.avro.AvroTypeException: Expected field name not found: "some_field"
at org.apache.avro.io.JsonDecoder.doAction(JsonDecoder.java:477)
at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
at org.apache.avro.io.JsonDecoder.advance(JsonDecoder.java:139)
at org.apache.avro.io.JsonDecoder.readString(JsonDecoder.java:219)
at org.apache.avro.io.JsonDecoder.readString(JsonDecoder.java:214)
at org.apache.avro.io.ResolvingDecoder.readString(ResolvingDecoder.java:201)
at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:422)
at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:414)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:181)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:232)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:315)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:258)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:256)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1375)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:261)
我知道杰克逊解码有一种方法可以忽略额外和缺失的字段。org.apache.avro.io.JsonDecoder 中是否有相同行为的方法?
解决方案
推荐阅读
- python - CNN 中的恒定精度
- javascript - Core(JS) - 如何将上下文从前向导航传递到选项卡视图?
- php - 如何在 FPDF 中导入生成的 GD 图像
- angular - Angular 订阅仅在组件加载时触发
- .net-core - .Net Core Identity 未登录
- json - 无法使用 WebHelpers 从 json 填充 excel 电子表格
- flask - 尝试使用烧瓶邮件发送邮件时未加载本地主机
- ruby-on-rails - 如何在 Ruby on Rails 应用程序中从倾斜注册引擎?
- kibana-6 - Kibana 的错误代码为 503
- c - 使用信号量同步来自不同进程的线程