首页 > 解决方案 > 将选定的行更改为列

问题描述

我有一个具有以下结构的数据框

+------+-------------+--------+
|region|          key|     val|
+--------------------+--------+
|Sample|row1         |       6|
|Sample|row1_category|   Cat 1|
|Sample|row1_Unit    |      Kg|
|Sample|row2         |       4|
|Sample|row2_category|   Cat 2|
|Sample|row2_Unit    |     ltr|
+------+-------------+--------+

我尝试添加一列并将值从行推送到列,但是类别和单位列

我想把它转换成下面的结构

+------+-------------+--------+--------+--------+
|region|          key|     val|Category|   Unit |
+--------------------+--------+--------+--------+
|Sample|row1         |       6|   Cat 1|      Kg|
|Sample|row2         |       4|   Cat 2|     ltr|
+------+-------------+--------+--------+--------+

这我需要为多个键做,我将有第 2 行,第 3 行等

标签: regexscaladataframeapache-sparkpivot

解决方案


scala> df.show
+------+-------------+----+
|region|          key| val|
+------+-------------+----+
|Sample|         row1|   6|
|Sample|row1_category|Cat1|
|Sample|    row1_Unit|  Kg|
|Sample|         row2|   4|
|Sample|row2_category|Cat2|
|Sample|    row2_Unit| ltr|
+------+-------------+----+


scala> val df1 = df.withColumn("_temp", split( $"key" , "_")).select(col("region"), $"_temp".getItem(0) as "key",$"_temp".getItem(1) as "colType",col("val"))


scala> df1.show(false)
+------+----+--------+----+
|region|key |colType |val |
+------+----+--------+----+
|Sample|row1|null    |6   |
|Sample|row1|category|Cat1|
|Sample|row1|Unit    |Kg  |
|Sample|row2|null    |4   |
|Sample|row2|category|Cat2|
|Sample|row2|Unit    |ltr |
+------+----+--------+----+


scala> val df2 = df1.withColumn("Category", when(col("colType") === "category", col("val"))).withColumn("Unit", when(col("colType") === "Unit", col("val"))).withColumn("val", when(col("colType").isNull, col("val")))


scala> df2.show(false)
+------+----+--------+----+--------+----+
|region|key |colType |val |Category|Unit|
+------+----+--------+----+--------+----+
|Sample|row1|null    |6   |null    |null|
|Sample|row1|category|null|Cat1    |null|
|Sample|row1|Unit    |null|null    |Kg  |
|Sample|row2|null    |4   |null    |null|
|Sample|row2|category|null|Cat2    |null|
|Sample|row2|Unit    |null|null    |ltr |
+------+----+--------+----+--------+----+


scala> val df3 = df2.groupBy("region", "key").agg(concat_ws("",collect_set(when($"val".isNotNull, $"val"))).as("val"),concat_ws("",collect_set(when($"Category".isNotNull, $"Category"))).as("Category"), concat_ws("",collect_set(when($"Unit".isNotNull, $"Unit"))).as("Unit"))


scala> df3.show()
+------+----+---+--------+----+
|region| key|val|Category|Unit|
+------+----+---+--------+----+
|Sample|row1|  6|    Cat1|  Kg|
|Sample|row2|  4|    Cat2| ltr|
+------+----+---+--------+----+

推荐阅读