scala - agg 函数将多行转换为具有不同类型的多列
问题描述
我想将具有相同 id 的多行的值转换为 columns ,但每列都是不同的类型。
输入数据 :
val dataInput = List(
Row( "meta001","duration", 2 , null, null),
Row("meta001","price", 300 , null , null),
Row("meta001","name", null , null , "name"),
Row("meta001","exist", null , true , null),
Row("meta002","price", 400 , null, null),
Row("meta002","duration", 3 , null, null)
)
val schemaInput = new StructType()
.add("id",StringType,true)
.add("code",StringType,true)
.add("integer value",IntegerType,true)
.add("boolean value",BooleanType,true)
.add("string value",StringType,true)
var dfInput = spark.createDataFrame(
spark.sparkContext.parallelize(dataInput),
schemaInput
)
+-------+--------+-------------+-------------+------------+
| id| code|integer value|boolean value|string value|
+-------+--------+-------------+-------------+------------+
|meta001|duration| 2| null| null|
|meta001| price| 300| null| null|
|meta001| name| null| null| name|
|meta001| exist| null| true| null|
|meta002| price| 400| null| null|
|meta002|duration| 3| null| null|
+-------+--------+-------------+-------------+------------+
预期输出:
+-------+--------+-------------+-------------+------------+
| id|duration|price |name |exist |
+-------+--------+-------------+-------------+------------+
|meta001| 2| 300| name| true|
|meta002| 3| 400| null| null|
+-------+--------+-------------+-------------+------------+
我认为我应该使用 groupBy 和 pivot 功能,但是当我应该 agg 结果时我有点迷失:
dfInput.groupby("id").pivot("code",Seq("duration","price","name","exist").agg(???)
解决方案
您不需要在这里枢轴,只需结合first
a when
:
dfInput
.groupBy($"id")
.agg(
first(when($"code" === "duration", $"integer value"), ignoreNulls = true).as("duration"),
first(when($"code" === "price", $"integer value"), ignoreNulls = true).as("price"),
first(when($"code" === "name", $"string value"), ignoreNulls = true).as("name"),
first(when($"code" === "exist", $"boolean value"), ignoreNulls = true).as("exist")
)
.show()
给
+-------+--------+-----+----+-----+
| id|duration|price|name|exist|
+-------+--------+-----+----+-----+
|meta001| 2| 300|name| true|
|meta002| 3| 400|null| null|
+-------+--------+-----+----+-----+
推荐阅读
- oauth-2.0 - Maintaining oath callback URLs
- opencv - 为什么 MEAN SHIFT + ORB 没有检测到(所有)多个对象?
- version-control - RTC SCM 查询工作区退出代码
- python - 导入熊猫似乎正在使用已弃用的 python 模块
- ibm-cloud - Watson 自定义情绪
- r - Rmarkdown 不生成 pdf
- python - Python 请求 Microsoft Graph API 身份验证
- html - 更改 Weebly 灯箱的背景颜色
- vb.net - Visual Studio - 如何在 vb.net 代码中查找字符串等于评估
- javascript - How to handle live data as sequential promises