首页 > 解决方案 > Spark Structed Streaming 从 kafka 读取嵌套的 json 并将其展平

问题描述

一个 json 类型的数据:

{
    "id": "34cx34fs987",
    "time_series": [
        {
            "time": "2020090300: 00: 00",
            "value": 342342.12
        },
        {
            "time": "2020090300: 00: 05",
            "value": 342421.88
        },
        {
            "time": "2020090300: 00: 10",
            "value": 351232.92
        }
    ]
}

我从kafka得到了json:

spark = SparkSession.builder.master('local').appName('test').getOrCreate()
df = spark.readStream.format("kafka")...

如何操作 df 以获取 DataFrame,如下所示:

     id             time          value
34cx34fs987  20200903 00:00:00  342342.12
34cx34fs987  20200903 00:00:05  342421.88
34cx34fs987  20200903 00:00:10  351232.92

标签: apache-sparkapache-kafka

解决方案


pyspark 中的示例代码

df2 = df.select("id", f.explode("time_series").alias("col"))
df2.select("id", "col.time", "col.value").show()

推荐阅读