首页 > 解决方案 > 转换包含在数组列中的列

问题描述

我需要在我的数据框中转换一个数组列,数组称为“城市”,类型为 Array(City),我想将城市名称大写。

结构:

val cities: StructField = StructField("cities", ArrayType(CityType), nullable = true)

def CityType: StructType =
    StructType(
      Seq(
        StructField(code, StringType, nullable = true),
        StructField(name, StringType, nullable = true)
      )
)

我试过的代码:

   .withColumn(
      newColumn,
      forall(
        col(cities),
        (col: Column) =>
          struct(
            Array(
              col(code),
              upper(col(name))
            ): _*
          )
      )
    )

错误说

无法解决'forall(...

标签: scalaapache-sparkapache-spark-sql

解决方案


没有这种东西叫做forall。您可以transform改用:

// sample data
val df = spark.sql("select array(struct('1' as code, 'abc' as name), struct('2' as code, 'def' as name)) cities")

import org.apache.spark.sql.Column

val df2 = df.withColumn(
    "newcol", 
    transform(
        col("cities"), 
        (c: Column) => struct(c("code"), upper(c("name")))
    )
)

df2.show
+--------------------+--------------------+
|              cities|              newcol|
+--------------------+--------------------+
|[[1, abc], [2, def]]|[[1, ABC], [2, DEF]]|
+--------------------+--------------------+

推荐阅读