首页 > 解决方案 > 如何根据条件获取火花数据框中的值

问题描述

我有一个如下所示的 DataFrame。列值的格式为 - 1_3_del_8_3,它基本上是由“ _del_ ”分隔的两个值。在这里,我们将得到两个部分 - 1_38_3。我们可以在这里省略第二部分。我正在寻找最佳方法来获取第一部分不是0_0的最后一列的值。

此外,我在 DF 中有 50 多列。

示例数据框

+-----------+-----------+-----------+
|         c1|         c2|         c3|
+-----------+-----------+-----------+
|1_3_del_8_3|2_3_del_6_3|0_0_del_8_3|
|2_9_del_4_3|0_0_del_4_3|2_5_del_4_3|
|2_8_del_4_3|0_0_del_4_3|0_0_del_4_3|
|5_3_del_4_3|2_3_del_4_3|7_3_del_4_3|
+-----------+-----------+-----------+

预期结果

+-----------+-----------+-----------+-----------+
|         c1|         c2|         c3|C4Out      |
+-----------+-----------+-----------+-----------+
|1_3_del_8_3|2_3_del_6_3|0_0_del_8_3|   2_3     |
|2_9_del_4_3|0_0_del_4_3|2_5_del_4_3|   2_5     |
|2_8_del_4_3|0_0_del_4_3|0_0_del_4_3|   2_8     |
|5_3_del_4_3|2_3_del_4_3|7_3_del_4_3|   7_3     |
+-----------+-----------+-----------+-----------+

我的试训——

val df1=Seq(
("1_3_del_8_3","2_3_del_6_3","0_0_del_8_3"),
("2_9_del_4_3","0_0_del_4_3","2_5_del_4_3"),
("2_8_del_4_3","0_0_del_4_3","0_0_del_4_3"),
("5_3_del_4_3","2_3_del_4_3","7_3_del_4_3")
)toDF("c1","c2","c3")

然后我尝试根据分隔符拆分列,然后我陷入了如何继续。我尝试了很多搜索,但在这里没有得到一个类似的问题。

标签: scalaapache-spark

解决方案


您可以使用 when() 有条件地检查您的用例:

when(Condition,value_if_condition_true).otherwise(value_if_condition_false)

根据您的使用情况,我们可以从右侧开始检查。如果 c3 满足条件,则选择 c3 中的值,否则将检查 c2,然后检查 c1,否则为 null。

scala> df.show()
+-----------+-----------+-----------+
|         c1|         c2|         c3|
+-----------+-----------+-----------+
|1_3_del_8_3|2_3_del_6_3|0_0_del_8_3|
|2_9_del_4_3|0_0_del_4_3|2_5_del_4_3|
|2_8_del_4_3|0_0_del_4_3|0_0_del_4_3|
|5_3_del_4_3|2_3_del_4_3|7_3_del_4_3|
+-----------+-----------+-----------+

scala> df.withColumn("C4Out",
when(substring($"c3",1,3)!=="0_0",substring($"c3",9,11))
.when(substring($"c2",1,3)!=="0_0",substring($"c2",9,11))
.when(substring($"c1",1,3)!=="0_0",substring($"c1",9,11))
.otherwise(null)).show()
+-----------+-----------+-----------+-----------+
|         c1|         c2|         c3|C4Out      |
+-----------+-----------+-----------+-----------+
|1_3_del_8_3|2_3_del_6_3|0_0_del_8_3|   2_3     |
|2_9_del_4_3|0_0_del_4_3|2_5_del_4_3|   2_5     |
|2_8_del_4_3|0_0_del_4_3|0_0_del_4_3|   2_8     |
|5_3_del_4_3|2_3_del_4_3|7_3_del_4_3|   7_3     |
+-----------+-----------+-----------+-----------+

推荐阅读