首页 > 解决方案 > Python Avro avro.io.AvroTypeException 数据不是模式的示例

问题描述

在问这个问题之前我搜索了很多,看起来我被卡住了,因此在这里问问题。我知道当 Schema 和 object 不匹配时会遇到此类错误,可能缺少某些数据类型或字段具有其他类型的值。

但是,我相信我的情况有所不同。我的应用程序很简单,它只将一个对象序列化和反序列化为 avro

我的数据类:


from time import time
from faker import Faker
from dataclasses import dataclass, field

from dataclasses_avroschema import AvroModel

Faker.seed(0)
fake = Faker()


@dataclass
class Head(AvroModel):
    msgId: str = field()
    msgCode: str = field()

    @staticmethod
    def fakeMe():
        return Head(fake.md5(),
                fake.pystr(min_chars=5, max_chars=5)
            )


@dataclass
class Message(AvroModel):
    head: Head = field()
    status: bool = field()

    class Meta:
        namespace = "me.com.Message.v1"

    def fakeMe(self):
        self.head = Head.fakeMe()
        self.bool = fake.pybool()


现在运行序列化的脚本:

import json, io as mainio
from dto.temp_schema import Message
from avro import schema, datafile, io as avroio

obj = Message(None, True)
obj.fakeMe()

schema_obj = schema.parse(json.dumps(Message.avro_schema_to_python()))

buf = mainio.BytesIO()
writer = datafile.DataFileWriter(buf, avroio.DatumWriter(), schema_obj)
writer.append(obj)
writer.flush()
buf.seek(0)
data = buf.read()

print("serialized avro: ", data)

当我运行它时,我收到以下错误:


Traceback (most recent call last):
  File "/Users/office/Documents/projects/msg-bench/scrib.py", line 28, in <module>
    writer.append(obj)
  File "/Users/office/opt/anaconda3/envs/benchenv/lib/python3.9/site-packages/avro/datafile.py", line 329, in append
    self.datum_writer.write(datum, self.buffer_encoder)
  File "/Users/office/opt/anaconda3/envs/benchenv/lib/python3.9/site-packages/avro/io.py", line 771, in write
    raise AvroTypeException(self.writer_schema, datum)


avro.io.AvroTypeException: The datum Message(head=Head(msgId='f112d652ecf13dacd9c78c11e1e7f987', msgCode='cYzVR'), status=True) is not an example of the schema {
  "type": "record",
  "name": "Message",
  "namespace": "me.com.Message.v1",
  "fields": [
    {
      "type": {
        "type": "record",
        "name": "Head",
        "namespace": "me.com.Message.v1",
        "fields": [
          {
            "type": "string",
            "name": "msgId"
          },
          {
            "type": "string",
            "name": "msgCode"
          }
        ],
        "doc": "Head(msgId: str, msgCode: str)"
      },
      "name": "head"
    },
    {
      "type": "boolean",
      "name": "status"
    }
  ],
  "doc": "Message(head: dto.temp_schema.Head, status: bool)"
}


请注意,我在 python 库的帮助下使用 Dataclass Object 生成模式: dataclasses-avroschema

在使用相同的模式之后,我仍然无法将数据序列化到 Avro。

目前我不确定我哪里出错了,我是 avro 的新手。为什么这不会编译?

系统和库统计信息:


Python==3.9.7
avro==1.10.2
avro-python3==1.10.2
dataclasses-avroschema==0.25.1
Faker==9.3.1
fastavro==1.4.5


标签: pythonpython-3.xserializationavro

解决方案


问题是您试图将Message对象传递给标准的 avro 库,而该库并不期望这样(而是需要字典)。您正在使用的库有一个部分讨论您可能想要查看的序列化:https ://marcosschroh.github.io/dataclasses-avroschema/serialization/

所以你的脚本只需要是这样的:

from dto.temp_schema import Message

obj = Message(None, True)
obj.fakeMe()
print("serialized avro: ", obj.serialize())

推荐阅读