scala - Scala在dataFrame中添加许多列
问题描述
private def example(channelRollups: Seq[String]): Seq[(String, String)]={
seqOfElement.flatMap(first => seqOfElement.
filter(second => first.toLowerCase() < second.toLowerCase()).
map((first, _)))
}
def addProductCol(df: DataFrame, cols: (String, String)): DataFrame = {
df.withColumn(s"${cols._1}_${cols._2}", df(cols._1) * df(cols._2))
}
def addAllProductCols(df: DataFrame): DataFrame = {
colPairs = example(exampleSeq)
colPairs.foldLeft(df)(addProductCol)
}
输出
colPairs = Seq(("left", "right"), ("middle", "right"))
+----+------+-----+
|left|middle|right|
+----+------+-----+
| 2| 3| 5|
| 7| 11| 13|
| 17| 19| 23|
+----+------+-----+
+----+------+-----+----------+------------+
|left|middle|right|left_right|middle_right|
+----+------+-----+----------+------------+
| 2| 3| 5| 10| 15|
| 7| 11| 13| 91| 143|
| 17| 19| 23| 391| 437|
+----+------+-----+----------+------------+
除了直接将列添加到 DataFrame 中,我们可以修改为 DataFrame => DataFrame 还是可以说转换?
def example2 : DataFrame => DataFrame = {
dataFrame => {
dataFrame.
withColumn(Column, when(col(ORDER_DATETIME_KEY).isNotNull, 1).otherwise(0))
}
}
所以基本上我想做第一个例子,比如 exmaple2,我将返回 dataType 作为 DataFrame => DataFrame 而不是作为 DataFrame