json - 替换 Spark 数据框中嵌套 json 字符串列中的值
问题描述
我有一个嵌套的 JSON 字符串作为列的一部分。我需要用地图中的一些东西替换更深的 JSON 中的键和值。
我之前使用过 typedLit 和一个列,但是如何在这里使用它。
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder().master("local[*]").getOrCreate()
import spark.implicits._
val df = Seq(
("1", """{"k": "foo", "v": 1.0}""", "some_other_field_1"),
("2", """{"p": "bar", "q": 3.0}""", "some_other_field_2"),
("3",
"""{"nstKey":[ {"fn":"name1","v1":false,"v2":true},
|{"fn":"name2","v1":"100","v2":"200"}
|]}""".stripMargin, "some_other_field_3")
).toDF("id", "json", "other")
df.show(truncate = false)
val kMap=Map("fn"->"rfn","v1"->"rv1","v2"->"rv2")
val vmap=Map("name1"->"rename1","name2"->"rename2")
val df1= df.withColumn("id1",col("id"))
.withColumn("other1",col("other"))
.withColumn("k",get_json_object(col("json"),"$.k"))
.withColumn("v",get_json_object(col("json"),"$.v"))
.withColumn("p",get_json_object(col("json"),"$.p"))
.withColumn("q",get_json_object(col("json"),"$.q"))
.withColumn("nestedKey",get_json_object(col("json"),"$.nstKey"))
.select("id1","other1","k","v","p","q","nestedKey")
df1.show(truncate = false)
结果 :
+---+------------------+----+----+----+----+--------------------------------------------------------------------------+
|id1|other1 |k |v |p |q |nestedKey |
+---+------------------+----+----+----+----+--------------------------------------------------------------------------+
|1 |some_other_field_1|foo |1.0 |null|null|null |
|2 |some_other_field_2|null|null|bar |3.0 |null |
|3 |some_other_field_3|null|null|null|null|[{"fn":"name1","v1":500,"v2":true},{"fn":"name2","v1":"abc","v2":"A-1"}]|
+---+------------------+----+----+----+----+--------------------------------------------------------------------------+
期待:
+---+------------------+----+----+----+----+--------------------------------------------------------------------------+
|id1|other1 |k |v |p |q |nestedKey |
+---+------------------+----+----+----+----+--------------------------------------------------------------------------+
|1 |some_other_field_1|foo |1.0 |null|null|null |
|2 |some_other_field_2|null|null|bar |3.0 |null |
|3 |some_other_field_3|null|null|null|null|[{"rfn":"rename1","rv1":500,"rv2":true},{"rfn":"rename2","rv1":"abc","rv2":"A-1"}]|
+---+------------------+----+----+----+----+--------------------------------------------------------------------------+
解决方案
您可以使用regexp_replace
函数来替换 JSON 字符串值。
val keyChangedDF = kMap.keySet.foldLeft(df1)((df,c)=>
df.withColumn("nestedKey", regexp_replace('nestedkey, "\""+c+"\":", "\""+kMap(c)+"\":")))
val valueChangedDF = vMap.keySet.foldLeft(keyChangedDF)((df,c)=>
df.withColumn("nestedKey", regexp_replace('nestedkey, "\""+c+"\",", "\""+vMap(c)+"\",")))
推荐阅读
- apache - Apache 默认站点配置
- javascript - Javascript不会在第一次按键时更新输入值
- javascript - \E 无法替换 from= '\E\\E\10.1.2.154\E\bcs\E\30877_P9999_Adult{2}_02_05_2019_0329p.pdf'
- c# - 使用正则表达式测试特定字符串
- c# - 如何计算使用 LINQ 设置的 JObject 中有多少字段?
- javascript - 错误:WebGL 警告:texImage2D:所需上传的数据比可用数据多:(加载带有三角形网格数据和法线的纹理时)
- android - 从内容 uri 读取文件时出现权限错误
- makefile - 如果函数中的条件 - Makefile
- c# - 如何在我的 c# android xamarin 项目中使用 google firebase
- c# - 缩写列表中给出的字符串