regex - 将选定的行更改为列
问题描述
我有一个具有以下结构的数据框
+------+-------------+--------+
|region| key| val|
+--------------------+--------+
|Sample|row1 | 6|
|Sample|row1_category| Cat 1|
|Sample|row1_Unit | Kg|
|Sample|row2 | 4|
|Sample|row2_category| Cat 2|
|Sample|row2_Unit | ltr|
+------+-------------+--------+
我尝试添加一列并将值从行推送到列,但是类别和单位列
我想把它转换成下面的结构
+------+-------------+--------+--------+--------+
|region| key| val|Category| Unit |
+--------------------+--------+--------+--------+
|Sample|row1 | 6| Cat 1| Kg|
|Sample|row2 | 4| Cat 2| ltr|
+------+-------------+--------+--------+--------+
这我需要为多个键做,我将有第 2 行,第 3 行等
解决方案
scala> df.show
+------+-------------+----+
|region| key| val|
+------+-------------+----+
|Sample| row1| 6|
|Sample|row1_category|Cat1|
|Sample| row1_Unit| Kg|
|Sample| row2| 4|
|Sample|row2_category|Cat2|
|Sample| row2_Unit| ltr|
+------+-------------+----+
scala> val df1 = df.withColumn("_temp", split( $"key" , "_")).select(col("region"), $"_temp".getItem(0) as "key",$"_temp".getItem(1) as "colType",col("val"))
scala> df1.show(false)
+------+----+--------+----+
|region|key |colType |val |
+------+----+--------+----+
|Sample|row1|null |6 |
|Sample|row1|category|Cat1|
|Sample|row1|Unit |Kg |
|Sample|row2|null |4 |
|Sample|row2|category|Cat2|
|Sample|row2|Unit |ltr |
+------+----+--------+----+
scala> val df2 = df1.withColumn("Category", when(col("colType") === "category", col("val"))).withColumn("Unit", when(col("colType") === "Unit", col("val"))).withColumn("val", when(col("colType").isNull, col("val")))
scala> df2.show(false)
+------+----+--------+----+--------+----+
|region|key |colType |val |Category|Unit|
+------+----+--------+----+--------+----+
|Sample|row1|null |6 |null |null|
|Sample|row1|category|null|Cat1 |null|
|Sample|row1|Unit |null|null |Kg |
|Sample|row2|null |4 |null |null|
|Sample|row2|category|null|Cat2 |null|
|Sample|row2|Unit |null|null |ltr |
+------+----+--------+----+--------+----+
scala> val df3 = df2.groupBy("region", "key").agg(concat_ws("",collect_set(when($"val".isNotNull, $"val"))).as("val"),concat_ws("",collect_set(when($"Category".isNotNull, $"Category"))).as("Category"), concat_ws("",collect_set(when($"Unit".isNotNull, $"Unit"))).as("Unit"))
scala> df3.show()
+------+----+---+--------+----+
|region| key|val|Category|Unit|
+------+----+---+--------+----+
|Sample|row1| 6| Cat1| Kg|
|Sample|row2| 4| Cat2| ltr|
+------+----+---+--------+----+
推荐阅读
- botframework - 如何将现有的已登录 Web 应用程序用户传递给 Azure Web 聊天机器人,以便只有经过身份验证和授权的用户才能使用 Web 聊天机器人?
- c# - 如何在运行时创建表时查询数据库
- javascript - 通过匹配键/值对重新组合对象数组
- java - 一旦写入这两个值,如何修复此 java 代码以呈现结果
- java - 如何从 Java 中的参数将项目添加到 Array Generic Object
- python - 在运行时定义 Redis 数据库
- sql - 尝试将列数据类型 NVARCHAR 转换为 INT 时出现错误
- django - 重定向vs反向django
- dart - 为什么我的 Dart Aqueduct 服务器中忽略了 main.dart 配置
- python - 如何将自定义python函数链接到odoo中表单视图的创建按钮