首页 > 解决方案 > NiFi 无法解析转换记录中的数据

问题描述

我正在尝试使用ConvertRecord处理器将 JSON 转换为 CSV,但我得到的唯一错误是Could not parse incoming data. 由于这不是很有描述性,我不知道如何诊断这个问题。

我知道我的 avro 架构是有效的,因为 A)当我将 NiFi 插入架构注册表时,NiFi 不会引发有关架构的错误,B)我在这里测试了我的架构,它没有给我带来问题。

我也知道我的 JSON 是有效的,因为我可以使用 Python 在 Python 中加载它json.loads()并且它不会给我带来任何问题。

我只是不太确定我哪里出错了,也不知道如何解决它。

JSON

{
  "DOC": {
    "DOCID": "1234",
    "Subjects": {
      "Subject_xref": ["2233"]
    },
    "TXT": {
      "COUNTRY": ["United States"],
      "ESTATE": ["Mount Vernon"],
      "PERSON": ["George Washington"]
    },
    "RAW_TXT": "George Washington lived in his family home, Mount Vernon, located in the United States.",
    "RELINFO": [
      {"ID" : "REL-1234-100",
      "RELTYPE" : "PER-PROP",
      "PERID" : "PER-1234-009",
      "PROPID" : "PROP-1234-001",
      "SENTID" : "1234-SENT-001",
      "PROP_NORM" : "Mount Vernon",
      "PROP_MENTION" : "Mount Vernon",
      "PER_NORM" : "George Washington",
      "PER_MENTION" : "George Washington"}
    ],
    "ENTINFO": [
      {"ID": "PER-1234-009", "TYPE": "PERSON", "NORM": "George Washington", "REFID": "PER-1234-009", "MENTION": "George Washington"},
      {"ID": "CTRY-1234-003", "TYPE": "COUNTRY", "NORM": "United States", "REFID": "CTRY-1234-003", "MENTION": "United States."},
      {"ID": "PROP-1234-001", "TYPE": "ESTATE", "NORM": "Mount Vernon", "REFID": "PROP-1234-001", "MENTION": "Mount Vernon"}
    ]
  }
}

阿夫罗

{
  "type": "record",
  "namespace": "name.space",
  "name": "nlp_output",
  "fields": [
    {"name": "DOC", "type": {
      "name": "DOCDocument", "type": "record", "namespace": "doc.name.space", "fields": [
        {"name": "DOCID", "type": ["long","null"], "default": null},
        {"name": "Subjects", "type": {
          "name": "Subjects", "type": "record", "namespace": "subjects.name.space", "fields": [
            {"name": "SubjectIdentificationID", "aliases": ["Subject_xref"], "type": ["long","null"], "default": null}
          ]
        }},
        {"name": "TXT", "type": {
          "name": "TXT", "type": "record", "namespace": "text.name.space", "fields": [
            {"name": "COUNTRY", "type": {"type": "array", "items": ["string", "null"]}, "default": null, "doc": ""},
            {"name": "ESTATE", "type": {"type": "array", "items": ["string", "null"]}, "default": null, "doc": ""},
            {"name": "PERSON", "type": {"type": "array", "items": ["string", "null"]}, "default": null, "doc": ""}
          ]
        }},
        {"name": "RAW_TXT", "type": ["string","null"], "default": null},
        {"name": "RELINFO", "type": {
          "name": "RelatedEntities", "type": "record", "namespace": "relent.name.space", "fields": [
            {"name": "ID", "type": ["string", "null"], "default": null},
            {"name": "RELTYPE", "type": ["string", "null"], "default": null},
            {"name": "PERID", "type": ["string", "null"], "default": null},
            {"name": "PROPID", "type": ["string", "null"], "default": null},
            {"name": "SENTID", "type": ["string", "null"], "default": null},
            {"name": "PROP_NORM", "type": ["string", "null"], "default": null},
            {"name": "PROP_MENTION", "type": ["string", "null"], "default": null},
            {"name": "PER_NORM", "type": ["string", "null"], "default": null},
            {"name": "PER_MENTION", "type": ["string", "null"], "default": null}
          ]
        }},
        {"name": "ENTINFO", "doc": "Sentences stripped of tags for ease of reading", "type": {
          "name": "Entities", "type": "record", "namespace": "entities.name.space", "fields": [
            {"name": "ID", "type": ["string", "null"], "default": null},
            {"name": "TYPE", "type": ["string", "null"], "default": null},
            {"name": "NORM", "type": ["string", "null"], "default": null},
            {"name": "REFID", "type": ["string", "null"], "default": null},
            {"name": "MENTION", "type": ["string", "null"], "default": null}
          ]
        }}
      ]
    }}
  ]
}

标签: jsonapache-nifiavro

解决方案


您的架构与您的 JSON 不匹配。您已SubjectIdentificationID定义为longornull但在 JSONSubject_xref中是一个数组。

{
  "type": "record",
  "namespace": "name.space",
  "name": "nlp_output",
  "fields": [
    {"name": "DOC", "type": {
      "name": "DOCDocument", "type": "record", "namespace": "doc.name.space", "fields": [
        {"name": "DOCID", "type": ["long","null"], "default": null},
        {"name": "Subjects", "type": {
          "name": "Subjects", "type": "record", "namespace": "subjects.name.space", "fields": [
            {"name": "SubjectIdentificationID", "aliases": ["Subject_xref"], "type": {"type": "array", "items": ["long", "null"]}, "default": null}
          ]
        }},
        {"name": "TXT", "type": {
          "name": "TXT", "type": "record", "namespace": "text.name.space", "fields": [
            {"name": "COUNTRY", "type": {"type": "array", "items": ["string", "null"]}, "default": null, "doc": ""},
            {"name": "ESTATE", "type": {"type": "array", "items": ["string", "null"]}, "default": null, "doc": ""},
            {"name": "PERSON", "type": {"type": "array", "items": ["string", "null"]}, "default": null, "doc": ""}
          ]
        }},
        {"name": "RAW_TXT", "type": ["string","null"], "default": null},
        {"name": "RELINFO", "type": {
          "name": "RelatedEntities", "type": "record", "namespace": "relent.name.space", "fields": [
            {"name": "ID", "type": ["string", "null"], "default": null},
            {"name": "RELTYPE", "type": ["string", "null"], "default": null},
            {"name": "PERID", "type": ["string", "null"], "default": null},
            {"name": "PROPID", "type": ["string", "null"], "default": null},
            {"name": "SENTID", "type": ["string", "null"], "default": null},
            {"name": "PROP_NORM", "type": ["string", "null"], "default": null},
            {"name": "PROP_MENTION", "type": ["string", "null"], "default": null},
            {"name": "PER_NORM", "type": ["string", "null"], "default": null},
            {"name": "PER_MENTION", "type": ["string", "null"], "default": null}
          ]
        }},
        {"name": "ENTINFO", "doc": "Sentences stripped of tags for ease of reading", "type": {
          "name": "Entities", "type": "record", "namespace": "entities.name.space", "fields": [
            {"name": "ID", "type": ["string", "null"], "default": null},
            {"name": "TYPE", "type": ["string", "null"], "default": null},
            {"name": "NORM", "type": ["string", "null"], "default": null},
            {"name": "REFID", "type": ["string", "null"], "default": null},
            {"name": "MENTION", "type": ["string", "null"], "default": null}
          ]
        }}
      ]
    }}
  ]
}

推荐阅读