首页 > 解决方案 > Scala在dataFrame中添加许多列

问题描述

private def example(channelRollups: Seq[String]): Seq[(String, String)]={
seqOfElement.flatMap(first => seqOfElement.
      filter(second => first.toLowerCase() < second.toLowerCase()).
      map((first, _)))

}
def addProductCol(df: DataFrame, cols: (String, String)): DataFrame = {
  df.withColumn(s"${cols._1}_${cols._2}", df(cols._1) * df(cols._2))
}

def addAllProductCols(df: DataFrame): DataFrame = {
 colPairs = example(exampleSeq)
  colPairs.foldLeft(df)(addProductCol)
}

输出

colPairs = Seq(("left", "right"), ("middle", "right"))
+----+------+-----+
|left|middle|right|
+----+------+-----+
|   2|     3|    5|
|   7|    11|   13|
|  17|    19|   23|
+----+------+-----+

+----+------+-----+----------+------------+
|left|middle|right|left_right|middle_right|
+----+------+-----+----------+------------+
|   2|     3|    5|        10|          15|
|   7|    11|   13|        91|         143|
|  17|    19|   23|       391|         437|
+----+------+-----+----------+------------+

除了直接将列添加到 DataFrame 中,我们可以修改为 DataFrame => DataFrame 还是可以说转换?

 def example2 : DataFrame => DataFrame = {
    dataFrame => {
      dataFrame.
        withColumn(Column, when(col(ORDER_DATETIME_KEY).isNotNull, 1).otherwise(0))
    }
  }

所以基本上我想做第一个例子,比如 exmaple2,我将返回 dataType 作为 DataFrame => DataFrame 而不是作为 DataFrame

标签: scalaapache-spark

解决方案


推荐阅读