scala - 如何根据条件获取火花数据框中的值
问题描述
我有一个如下所示的 DataFrame。列值的格式为 - 1_3_del_8_3,它基本上是由“ _del_ ”分隔的两个值。在这里,我们将得到两个部分 - 1_3和8_3。我们可以在这里省略第二部分。我正在寻找最佳方法来获取第一部分不是0_0的最后一列的值。
此外,我在 DF 中有 50 多列。
示例数据框
+-----------+-----------+-----------+
| c1| c2| c3|
+-----------+-----------+-----------+
|1_3_del_8_3|2_3_del_6_3|0_0_del_8_3|
|2_9_del_4_3|0_0_del_4_3|2_5_del_4_3|
|2_8_del_4_3|0_0_del_4_3|0_0_del_4_3|
|5_3_del_4_3|2_3_del_4_3|7_3_del_4_3|
+-----------+-----------+-----------+
预期结果
+-----------+-----------+-----------+-----------+
| c1| c2| c3|C4Out |
+-----------+-----------+-----------+-----------+
|1_3_del_8_3|2_3_del_6_3|0_0_del_8_3| 2_3 |
|2_9_del_4_3|0_0_del_4_3|2_5_del_4_3| 2_5 |
|2_8_del_4_3|0_0_del_4_3|0_0_del_4_3| 2_8 |
|5_3_del_4_3|2_3_del_4_3|7_3_del_4_3| 7_3 |
+-----------+-----------+-----------+-----------+
我的试训——
val df1=Seq(
("1_3_del_8_3","2_3_del_6_3","0_0_del_8_3"),
("2_9_del_4_3","0_0_del_4_3","2_5_del_4_3"),
("2_8_del_4_3","0_0_del_4_3","0_0_del_4_3"),
("5_3_del_4_3","2_3_del_4_3","7_3_del_4_3")
)toDF("c1","c2","c3")
然后我尝试根据分隔符拆分列,然后我陷入了如何继续。我尝试了很多搜索,但在这里没有得到一个类似的问题。
解决方案
您可以使用 when() 有条件地检查您的用例:
when(Condition,value_if_condition_true).otherwise(value_if_condition_false)
根据您的使用情况,我们可以从右侧开始检查。如果 c3 满足条件,则选择 c3 中的值,否则将检查 c2,然后检查 c1,否则为 null。
scala> df.show()
+-----------+-----------+-----------+
| c1| c2| c3|
+-----------+-----------+-----------+
|1_3_del_8_3|2_3_del_6_3|0_0_del_8_3|
|2_9_del_4_3|0_0_del_4_3|2_5_del_4_3|
|2_8_del_4_3|0_0_del_4_3|0_0_del_4_3|
|5_3_del_4_3|2_3_del_4_3|7_3_del_4_3|
+-----------+-----------+-----------+
scala> df.withColumn("C4Out",
when(substring($"c3",1,3)!=="0_0",substring($"c3",9,11))
.when(substring($"c2",1,3)!=="0_0",substring($"c2",9,11))
.when(substring($"c1",1,3)!=="0_0",substring($"c1",9,11))
.otherwise(null)).show()
+-----------+-----------+-----------+-----------+
| c1| c2| c3|C4Out |
+-----------+-----------+-----------+-----------+
|1_3_del_8_3|2_3_del_6_3|0_0_del_8_3| 2_3 |
|2_9_del_4_3|0_0_del_4_3|2_5_del_4_3| 2_5 |
|2_8_del_4_3|0_0_del_4_3|0_0_del_4_3| 2_8 |
|5_3_del_4_3|2_3_del_4_3|7_3_del_4_3| 7_3 |
+-----------+-----------+-----------+-----------+
推荐阅读
- haskell - 我的 recurvise haskell 代码有什么问题?
- javascript - 合并不同的数组并计算平均价格 Javascript
- javascript - 有没有办法从当前页面上的文件输入中使用文件
- javascript - 角色计数无法正常工作 discord.js
- html - 铬 - 带有水印的印刷问题
- python - 如何按月订购表格,然后在绘图图中显示?Python
- javascript - MDX 将样式应用于错误的元素(next-mdx-remote 中的 MDX)
- html - 将 svg 图标与文本对齐
- angularjs - 如何在以角度形式动态添加新输入字段时禁用以前的输入字段?
- reactjs - 创建动态组件