scala - 在scala spark中统一数据框的两行
问题描述
我有一个具有相同记录的 DataFrame,除了金额字段,我想要将它统一在一行中,并且金额字段具有两个金额的总和。就像我会做的那样?在斯卡拉。
I get the dataframe from a database:
val my_table = spark.read.table("table.myTable")
val df = my_table
.filter(col("ID")==="10") and
.filter(col("CENT")==="20") and
.filter(col("PROD")=== "122") and
.filter(col("CONTR").isin("0004", "0005", "0006"))).select(
"ID", "CENT", "PROD", "CONTR", "COD", "DATE", "AMOUNT").distinct()
df.show()
---------+--------+--------------+------------+-------------+-----------+--------+
ID | CENT | PROD |CONTR |COD | DATE | Amount |
---------+--------+--------------+------------+-------------+-----------+--------+
10 |20 |122 |0004 |COD1 |2006-11-04 | 150.0 |
10 |20 |122 |0004 |COD1 |2006-11-04 | 300.0 |
10 |20 |122 |0005 |COD2 |2012-10-17 | 100.0 |
10 |20 |122 |0006 |COD3 |2015-12-05 | 500.0 |
---------+--------+--------------+------------+-------------+-----------+--------+
Expected:
---------+--------+--------------+------------+-------------+-----------+--------+
ID | CENT | PROD |CONTR |COD | DATE | Amount |
---------+--------+--------------+------------+-------------+-----------+--------+
10 |20 |122 |0004 |COD1 |2006-11-04 | 450.0 |
10 |20 |122 |0005 |COD2 |2012-10-17 | 100.0 |
10 |20 |122 |0006 |COD3 |2015-12-05 | 500.0 |
---------+--------+--------------+------------+-------------+-----------+--------+
解决方案
下面的代码对所有列进行分组,除了amount
并在列上运行sum
聚合amount
并按以下顺序排序cod
object GroupBy {
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder().master("local[*]").getOrCreate();
import spark.implicits._
val df = List(Bean3(10,20,122,"0004","COD1","2006-11-04",150.0),
Bean3(10,20,122,"0004","COD1","2006-11-04",300.0),
Bean3(10,20,122,"0005","COD2","2012-10-17",100.0),
Bean3(10,20,122,"0006","COD3","2015-12-05",500.0)
).toDF
val groupByCol = df.columns.diff(Array("id", "amount"))
df.groupBy("id",groupByCol: _*).sum("amount")
.withColumnRenamed("sum(amount)","amount")
.orderBy("cod")
.show()
}
}
case class Bean3(id : Int,cent : Int,prod: Int,contr : String,cod : String,date : String,amount : Double)
+---+----+----+-----+----+----------+------+
| id|cent|prod|contr| cod| date|amount|
+---+----+----+-----+----+----------+------+
| 10| 20| 122| 0004|COD1|2006-11-04| 450.0|
| 10| 20| 122| 0005|COD2|2012-10-17| 100.0|
| 10| 20| 122| 0006|COD3|2015-12-05| 500.0|
+---+----+----+-----+----+----------+------+
推荐阅读
- python - os.chdir() 命令给出 FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\ in jupyter notebook
- ruby - 如何使用 FXRuby 连接到数据库
- android - android:persistent 与 Service.START_STICKY 有什么区别?
- python - 使用 pandas 内置函数和 for 语句用 python 计算最小值和最大值
- wordpress - 根据付款方式从 Woocommerce 新订单电子邮件中删除银行帐户详细信息
- java - 如何通过 Rest API 发送 JSONArray
- android - 使用 Firebase 存储 UI 加载图像 - 更改后图像会自动失效吗?
- java - 如何在 Visual Studio Code + java 上调试 Play Framework 1.x 应用程序
- android - 如何在 Android Studio 中缓存音频以供离线使用
- php - 需要帮助在 if-else 语句中嵌入 if-else 语句以导致页面重定向