首页 > 解决方案 > 在 Scala spark 中实现动态字符串插值?

问题描述

我有一个字符串,其中包含需要进入.agg我预期数据框的函数的函数。我的数据数据框看起来像

val client = Seq((1,"A","D",10),(2,"A","D",5),(3,"B","C",56),(5,"B","D",67)).toDF("ID","Categ","subCat","Amnt")
+---+-----+------+----+
| ID|Categ|subCat|Amnt|
+---+-----+------+----+
|  1|    A|     D|  10|
|  2|    A|     D|   5|
|  3|    B|     C|  56|
|  5|    B|     D|  67|
+---+-----+------+----+

所以我试图插入这个刺痛

val str= "s"$count(ID) as Total,$sum(Amnt) as amt""

我想实现这个作为输出

client.groupBy("Categ","subCat").agg(sum("Amnt") as "amt",count("ID") as "Total").show()
+-----+------+---+-----+
|Categ|subCat|amt|Total|
+-----+------+---+-----+
|    B|     C| 56|    1|
|    A|     D| 15|    2|
|    B|     D| 67|    1|
+-----+------+---+-----+

我试过这个

 client.groupBy("Categ","subCat").agg(s"$str").show()

收到错误

> error: overloaded method value agg with alternatives:  

(expr: org.apache.spark.sql.Column,exprs: org.apache.spark.sql.Column*)org.apache.spark.sql.DataFrame
(exprs: java.util.Map[String,String])org .apache.spark.sql.DataFrame (exprs: scala.collection.immutable.Map[String,String])org.apache.spark.sql.DataFrame (aggExpr: (String, String),aggExprs: (String, String)* )org.apache.spark.sql.DataFrame 不能应用于(字符串)

我也试过 expr

    val str="sum(Amnt) as amt"
    client.groupBy("Categ","subCat").agg(expr(str)).show()
 this return the desired outcome
    +-----+------+---+
    |Categ|subCat|amt|
    +-----+------+---+
    |    B|     C| 56|
    |    A|     D| 15|
    |    B|     D| 67|
    +-----+------+---+

但是当我再次尝试时 val str="sum(Amnt) as amt,count(ID) as ID_tot"

    client.groupBy("Categ","subCat").agg(expr(str)).show()
org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input ',' expecting <EOF>(line 1, pos 16)

标签: scalaapache-sparkapache-spark-sqlstring-interpolation

解决方案


有点粗略的解决方案:拆分,并调用expr每个:

val str="sum(Amnt) as amt,count(ID) as ID_tot"
val (first, rest) = str.split(",").map(expr).splitAt(1)
client.groupBy("Categ","subCat").agg(first, rest: _*)

如果,可以是表达式的一部分(例如,在字符串文字内),情况会变得更糟:尝试用 解析它expr,捕捉ParseException并查看它在哪里结束?确实应该有一个更直接的方法,但我不知道。


推荐阅读