首页 > 解决方案 > agg 函数将多行转换为具有不同类型的多列

问题描述

我想将具有相同 id 的多行的值转换为 columns ,但每列都是不同的类型。

输入数据 :

val dataInput = List(
  Row( "meta001","duration", 2 , null, null),
  Row("meta001","price", 300 , null , null),
  Row("meta001","name", null , null , "name"),
  Row("meta001","exist", null , true , null),
  Row("meta002","price", 400 , null,  null),
  Row("meta002","duration", 3 , null, null)
)

val schemaInput = new StructType()
  .add("id",StringType,true)
  .add("code",StringType,true)
  .add("integer value",IntegerType,true)
  .add("boolean value",BooleanType,true)
  .add("string value",StringType,true)

var dfInput = spark.createDataFrame(
  spark.sparkContext.parallelize(dataInput),
  schemaInput
)

    +-------+--------+-------------+-------------+------------+
    |     id|    code|integer value|boolean value|string value|
    +-------+--------+-------------+-------------+------------+
    |meta001|duration|            2|         null|        null|
    |meta001|   price|          300|         null|        null|
    |meta001|    name|         null|         null|        name|
    |meta001|   exist|         null|         true|        null|
    |meta002|   price|          400|         null|        null|
    |meta002|duration|            3|         null|        null|
    +-------+--------+-------------+-------------+------------+

预期输出:

    +-------+--------+-------------+-------------+------------+
    |     id|duration|price         |name        |exist       |
    +-------+--------+-------------+-------------+------------+
    |meta001|       2|          300|         name|        true|
    |meta002|       3|          400|         null|        null|
    +-------+--------+-------------+-------------+------------+

我认为我应该使用 groupBy 和 pivot 功能,但是当我应该 agg 结果时我有点迷失:

dfInput.groupby("id").pivot("code",Seq("duration","price","name","exist").agg(???)

标签: scalaapache-spark

解决方案


您不需要在这里枢轴,只需结合firsta when

dfInput
  .groupBy($"id")
  .agg(
    first(when($"code" === "duration", $"integer value"), ignoreNulls = true).as("duration"),
    first(when($"code" === "price", $"integer value"), ignoreNulls = true).as("price"),
    first(when($"code" === "name", $"string value"), ignoreNulls = true).as("name"),
    first(when($"code" === "exist", $"boolean value"), ignoreNulls = true).as("exist")
  )
  .show()

+-------+--------+-----+----+-----+
|     id|duration|price|name|exist|
+-------+--------+-----+----+-----+
|meta001|       2|  300|name| true|
|meta002|       3|  400|null| null|
+-------+--------+-----+----+-----+

推荐阅读