首页 > 解决方案 > 否则 - 子句没有按预期工作,这里有什么问题?

问题描述

我正在使用 spark-sql-2.4.1v 如何根据列的值进行各种连接,我需要为给定的值列获取 map_val 列的多个查找值,如下所示。

样本数据:

val data = List(
  ("20", "score", "school", "2018-03-31", 14 , 12),
  ("21", "score", "school", "2018-03-31", 13 , 13),
  ("22", "rate", "school", "2018-03-31", 11 , 14),
  ("21", "rate", "school", "2018-03-31", 13 , 12)
 )
val df = data.toDF("id", "code", "entity", "date", "value1", "value2")

df.show

+---+-----+------+----------+------+------+
| id| code|entity|      date|value1|value2|
+---+-----+------+----------+------+------+
| 20|score|school|2018-03-31|    14|    12|
| 21|score|school|2018-03-31|    13|    13|
| 22| rate|school|2018-03-31|    11|    14|
| 21| rate|school|2018-03-31|    13|    12|
+---+-----+------+----------+------+------+




 val resultDs = df
                 .withColumn("value1",
                        when(col("code").isin("rate") , functions.callUDF("udfFunc",col("value1")))
                         .otherwise(col("value1").cast(DoubleType))
                      )

udfFunc 映射如下

11->a
12->b
13->c
14->d

预期产出

+---+-----+------+----------+------+------+
| id| code|entity|      date|value1|value2|
+---+-----+------+----------+------+------+
| 20|score|school|2018-03-31|    14|    12|
| 21|score|school|2018-03-31|    13|    13|
| 22| rate|school|2018-03-31|    a |    14|
| 21| rate|school|2018-03-31|    c |    12|
+---+-----+------+----------+------+------+

但它给出的输出为

+---+-----+------+----------+------+------+
| id| code|entity|      date|value1|value2|
+---+-----+------+----------+------+------+
| 20|score|school|2018-03-31|  null|    12|
| 21|score|school|2018-03-31|  null|    13|
| 22| rate|school|2018-03-31|    a |    14|
| 21| rate|school|2018-03-31|    c |    12|
+---+-----+------+----------+------+------+

为什么“否则”条件没有按预期工作。知道这里有什么问题吗?

标签: apache-sparkapache-spark-sql

解决方案


列应包含相同的数据类型。

注意-DoubleType无法存储StringTyp数据,因此您需要转换DoubleTypeStringType.

val resultDs = df
.withColumn("value1",
        when(col("code") === lit("rate") ,functions.callUDF("udfFunc",col("value1")))
        .otherwise(col("value1").cast(StringType)) // Should be StringType
    )

或者

val resultDs = df
                 .withColumn("value1",
                        when(col("code").isin("rate") , functions.callUDF("udfFunc",col("value1")))
                         .otherwise(col("value1").cast(StringType)) // Modified to StringType
                      )

推荐阅读