首页 > 解决方案 > 如何使用列名随机更新特定行的列值

问题描述

def getSequence(row : Row) : Seq[String] = {
some code
}

基本上我想逐行迭代dataFrame,并用1更新我从getSequence获得的序列的值。

输入

+---+----+-----+
|sno|dept|color|
+---+----+-----+
|  1|  0 |  0  |
|  2|  0 |  0  |
|  3|  0 |  0  |
+---+----+-----+

getSequence for Row 1 give Seq("dept")
Row 2 give Seq("color") Row 3 give Seq("dept","color")
output be like 
+---+----+-----+
|sno|dept|color|
+---+----+-----+
|  1|  1 |  0  |
|  2|  0 |  1  |
|  3|  1 |  1  |
+---+----+-----+

标签: scalaapache-spark

解决方案


def lit(literal: Any): org.apache.spark.sql.Column

def monotonically_increasing_id(): org.apache.spark.sql.Column

使用lit函数更新列值。

请检查以下代码以更新特定列。

scala> val df = Seq((1,0,0),(2,0,0),(3,0,0)).toDF("sno","dept","color").withColumn("id",monotonically_increasing_id)
df: org.apache.spark.sql.DataFrame = [sno: int, dept: int ... 2 more fields]

scala> df.withColumn("dept",when($"id" =!= 1,lit(1)).otherwise(lit(0))).withColumn("color",when($"id" =!= 0,lit(1)).otherwise(lit(0))).drop("id").show(false)
+---+----+-----+
|sno|dept|color|
+---+----+-----+
|1  |1   |0    |
|2  |0   |1    |
|3  |1   |1    |
+---+----+-----+


推荐阅读