首页 > 解决方案 > 如何从 SparkSQL 中的双嵌套映射中提取值?

问题描述

我正在尝试访问 SparkSQL 中的列(称为 auxdata),如下所示:

{"data_type":"2", "additional_data": ""{\"session_id\": \"102s\", \"from_user_id\": kkk0000, \"object_id\": \"aaaa68764\"}"" }

我想在“additional_data”中提取object_id。

在 Presto 中,我能够做到这一点:

从表中选择 json_extract_scalar(json_parse(cast(json_parse(auxdata['additional_data']) as varchar)), '$.object_id') 作为 obj_id

在 SparkSQL 中有什么方法可以做到这一点吗?

我试过了:

从表中选择 get_json_object(element_at(auxdata, 'additional_data'), '$.object_id') 作为 obj_id

但它返回null。

提前感谢您的任何建议!

标签: apache-spark-sqlnested

解决方案


我认为,您提供的 json 不正确。我在下面的示例中对其进行了修改。您可以使用嵌套get_json_object来解析嵌套的 json-

   val data =
      """
        |{"data_type":"2", "additional_data": "{\"session_id\": \"102s\", \"from_user_id\": \"kkk0000\",\"object_id\": \"aaaa68764\"}"}
      """.stripMargin
    val df = spark.range(1).withColumn("auxdata", lit(data))
    df.show(false)
    df.printSchema()

    /**
      * +---+---------------------------------------------------------------------------------------------------------------------------------------+
      * |id |auxdata                                                                                                                                |
      * +---+---------------------------------------------------------------------------------------------------------------------------------------+
      * |0  |
      * {"data_type":"2", "additional_data": "{\"session_id\": \"102s\", \"from_user_id\": \"kkk0000\",
      * \"object_id\": \"aaaa68764\"}"}
      * |
      * +---+---------------------------------------------------------------------------------------------------------------------------------------+
      *
      * root
      * |-- id: long (nullable = false)
      * |-- auxdata: string (nullable = false)
      */

    df.withColumn("obj_id", get_json_object(get_json_object($"auxdata", "$.additional_data"), "$.object_id"))
      .show(false)

    /**
      * +---+--------------------------------------------------------------------------------------------------------------------------------------+---------+
      * |id |auxdata                                                                                                                               |obj_id   |
      * +---+--------------------------------------------------------------------------------------------------------------------------------------+---------+
      * |0  |
      * {"data_type":"2", "additional_data": "{\"session_id\": \"102s\", \"from_user_id\": \"kkk0000\",\"object_id\": \"aaaa68764\"}"}
      * |aaaa68764|
      * +---+--------------------------------------------------------------------------------------------------------------------------------------+---------+
      */

推荐阅读