apache-kafka - Kafka SQL (KSQL) 流不适用于具有嵌套字段的 JSON 数据
问题描述
我正在尝试在 Kafka 主题之上在 KSQL 中创建一个 Kafka 流。我将如下 JSON 记录存储在 Kafka 主题中。
{
"venue": {
"venue_name": "HATCH",
"lon": -71.18291,
"lat": 42.36667,
"venue_id": 22491322
},
"visibility": "public",
"response": "yes",
"guests": 0,
"member": {
"member_id": 237655942,
"member_name": "Nts"
},
"rsvp_id": 1724941595,
"mtime": 1524620970613,
"event": {
"event_name": "Intro to Soldering",
"event_id": "250106100",
"time": 1526853600000,
"event_url": "https:\/\/www.meetup.com\/Makers-of-HATCH-Makerspace\/events\/250106100\/"
},
"group": {
"group_topics": [
{
"urlkey": "quilting",
"topic_name": "Quilting"
},
{
"urlkey": "robotics",
"topic_name": "Robotics"
},
{
"urlkey": "sewing",
"topic_name": "Sewing"
},
{
"urlkey": "edtech",
"topic_name": "Education & Technology"
},
{
"urlkey": "craftswap",
"topic_name": "Crafts"
},
{
"urlkey": "diy",
"topic_name": "DIY (Do It Yourself)"
},
{
"urlkey": "hacking",
"topic_name": "Hacking"
},
{
"urlkey": "3d-modeling",
"topic_name": "3D Modeling"
},
{
"urlkey": "tools",
"topic_name": "Tools"
},
{
"urlkey": "arduino",
"topic_name": "Arduino"
},
{
"urlkey": "makers",
"topic_name": "Makers"
},
{
"urlkey": "makerspaces",
"topic_name": "Makerspaces"
},
{
"urlkey": "3d-printing",
"topic_name": "3D Printing"
},
{
"urlkey": "laser-cutting",
"topic_name": "Laser Cutting"
},
{
"urlkey": "scrapbook-die-cutting-machines",
"topic_name": "Scrapbook die cutting machines."
}
],
"group_city": "Watertown",
"group_country": "us",
"group_id": 18457932,
"group_name": "Makers of HATCH Makerspace",
"group_lon": -71.18,
"group_urlname": "Makers-of-HATCH-Makerspace",
"group_state": "MA",
"group_lat": 42.37
}
}
此数据已加载到 Kafka 主题中。
我在 KSQL 中创建了一个流,如下所示。
CREATE STREAM meetup_rsvp_raw
( Venue varchar,
Visibility varchar,
Response varchar,
Guests integer,
Member varchar,
rsvp_id bigint,
mtime bigint,
event varchar,
group_info varchar
) WITH (KAFKA_TOPIC='meetup-rsvp', VALUE_FORMAT='JSON');
我在 group_info(kafka 流中的最后一个字段)字段中看到 null。注意:卡夫卡不允许我创建一个名为“组”的字段,因为它是一个关键字。因此将该字段命名为 group_info。
ksql> select * from meetup_rsvp_raw limit 2;
1524624181126 | null | {"venue_name":"Houghton's Pond - Blue Hills","lon":-71.09453,"lat":42.208187,"venue_id":1506300} | public | yes | 0 | {"member_id":159617162,"photo":"https://secure.meetupstatic.com/photos/member/7/2/b/c/thumb_215729372.jpeg","member_name":"Tena Kerns"} | 1724949934 | 1524623875376 | {"event_name":"Blue Hills Buck Hill - Easy Pace / Moderate hike","event_id":"250084062","time":1525010400000,"event_url":"https://www.meetup.com/HikeBikeSocialClub/events/250084062/"} | null
1524624181126 | null | {"venue_name":"Community Wholeness Centre CWC","lon":-79.69191,"lat":44.38976,"venue_id":19966962} | public | no | 0 | {"member_id":222279178,"photo":"https://secure.meetupstatic.com/photos/member/d/3/f/c/thumb_273714268.jpeg","member_name":"Natalie Roy"} | 1724949935 | 1524623875430 | {"event_name":"Karate Class - Ken Shin Budo Kai","event_id":"kbsjtmyxgbnc","time":1525129200000,"event_url":"https://www.meetup.com/CWCBarrie/events/250120204/"} | null
不知道我做错了什么,但欢迎提出任何建议。
解决方案
没错,'GROUP' 是 KSQL 中的关键字。您正在解决在CREATE STREAM
语句中重命名字段名称的工作,因为 KSQL 不知道您的group_info
列是指该group
字段。
您可以使用列周围的引号来允许您导入主题,(目前,引号中的标识符需要大写,但这是一个错误),例如
CREATE STREAM meetup_rsvp_raw
( venue varchar,
visibility varchar,
response varchar,
guests integer,
member varchar,
rsvp_id bigint,
mtime bigint,
event varchar,
"GROUP" varchar
) WITH (KAFKA_TOPIC='meetup-rsvp', VALUE_FORMAT='JSON');
请注意,您还需要在选择此字段时使用引号:
SELECT `GROUP` from meetup_rsvp_raw limit 5;
我创建了一个Github 问题来跟踪该领域缺乏文档。
让我们知道你是如何处理这件事的。
谢谢,
安迪
推荐阅读
- c++ - 如何进行无分支数字循环?
- c++ - QGridLayout 小部件设置相等大小
- sql - 如果 count(column) 然后在 Oracle 中选择
- c# - 在 Cefsharp.WPF 中,我应该如何对“fi-FI”(芬兰语)语言进行拼写检查?
- python - 我在 Pygame 中收到一个关于字体的表面错误。我该如何解决?
- str-replace - 使用 str_replace 函数回显错误数据
- javascript - 获取“仅对页面组件执行导出的查询”。在 Gatsby 中尝试生成页面时
- flutter - 使用 Flutter 和 Dart 实现多个过滤器
- laravel - 如何编写更改迁移以在 laravel 中添加复合键
- php - 来自 ABAP Web 服务的响应返回多个项目的数组,但返回单个项目的对象