首页 > 解决方案 > 如何从键值映射中提取值,火花数据框

问题描述

我有一个带有地图的列,其中键和值发生了变化。我正在尝试提取值并创建一个新列。输入

---------------+
|symbols        |
+---------------+
|[3pea -> 3PEA] |
|[barello -> BA]|
|[]             |
|[]             |
+---------------+

预期产出

---------------+
|symbols        |
+---------------+
|3PEA         |
|BA           |
|             |
|            |
+---------------+

这是我到目前为止使用 UDF 尝试过的

def map_value=udf((inputMap:Map[String,String])=> {inputMap.map(x=>x._2) 
      }) 

但这给了我

java.lang.UnsupportedOperationException: Schema for type scala.collection.immutable.Iterable[String] is not supported

标签: apache-spark-sql

解决方案


import org.apache.spark.sql.functions._
import spark.implicits._
val m = Seq(Array("A -> abc"), Array("B -> 0.11856755943424617"), Array("C -> kqcams"))

val df = m.toDF("map_data")
df.show
// Simulate your data I think.

val df2 = df.withColumn("xxx", split(concat_ws("",$"map_data"), "-> ")).select($"xxx".getItem(1).as("map_val")).drop("xxx")
df2.show(false)

结果是:

+--------------------+
|            map_data|
+--------------------+
|          [A -> abc]|
|[B -> 0.118567559...|
|       [C -> kqcams]|
+--------------------+

+-------------------+
|map_val            |
+-------------------+
|abc                |
|0.11856755943424617|
|kqcams             |
+-------------------+

推荐阅读