首页 > 解决方案 > 如何在结构中的元素内添加元素,该元素是结构中的元素 spark scala中的spark Dataframe

问题描述

我需要在结构本身内部的结构中添加一个元素。

文件:

{"teamName":{"Redbull"},"info":{"drivers":{"driver":{"Max Verstappen","Alex Albon"},"carNumbers":{"33","23"},"carName":"RB7"}}}

基础DF:

val jsonDF=spark.read.json("path")
jsonDF.printSchema

root
 |-- info: struct (nullable = true)
 |    |-- drivers: struct (nullable = true)
 |    |    |-- carName: string (nullable = true)
 |    |    |-- carNumbers: string (nullable = true)
 |    |    |-- driver: string (nullable = true)
 |-- teamName: string (nullable = true)

我需要在里面添加年龄,信息 -> 驱动程序 ->

当我这样做时

jsonDF.withColumn("info",struct(col("info.drivers").alias("drivers"), lit("24").alias("age"))).printSchema

root
 |-- info: struct (nullable = false)
 |    |-- drivers: struct (nullable = true)
 |    |    |-- carName: string (nullable = true)
 |    |    |-- carNumbers: string (nullable = true)
 |    |    |-- driver: string (nullable = true)
 |    |-- age: string (nullable = false)
 |-- teamName: string (nullable = true)


我在信息下得到它,我需要在驱动程序中得到它,我该怎么做?

标签: scalaapache-sparkstruct

解决方案


我会选择一个名为 spark-hats 的库https://github.com/AbsaOSS/spark-hats

然后就是

import za.co.absa.spark.hats.Extensions._

val jsonDFwithAge = jsonDF.nestedWithColumn("info.drivers.age", lit("24"))

jsonDFwithAge.printSchema
root
 |-- info: struct (nullable = false)
 |    |-- drivers: struct (nullable = false)
 |    |    |-- carName: string (nullable = true)
 |    |    |-- carNumbers: string (nullable = true)
 |    |    |-- driver: string (nullable = true)
 |    |    |-- ag: string (nullable = false)
 |-- teamName: string (nullable = true)

推荐阅读