首页 > 解决方案 > 如何将 Spark 列的别名作为字符串获取?

问题描述

如果我在 a 中声明一个 Column val,如下所示:

import org.apache.spark.sql.functions._
val col: org.apache.spark.sql.Column = count("*").as("col_name")

col是类型org.apache.spark.sql.Column。有没有办法访问它的名字("col_name")?就像是:

col.getName() // returns "col_name"

在这种情况下,col.toString返回"count(1) AS col_name"

标签: scalaapache-spark

解决方案


试试下面的代码。

scala> val cl = count("*").as("col_name")
cl: org.apache.spark.sql.Column = count(1) AS `col_name`
scala> cl.expr.argString
res14: String = col_name
scala> cl.expr.productElement(1).asInstanceOf[String]
res24: String = col_name
scala> val cl = count("*").cast("string").as("column_name")
cl: org.apache.spark.sql.Column = CAST(count(1) AS STRING) AS `column_name`

scala> cl.expr.argString
res113: String = column_name

从上面的代码中,如果你改变.as&.cast它会给你错误的结果。

您也可以使用json4snameexpr.toJSON

scala> import org.json4s._
import org.json4s._

scala> import org.json4s.jackson.JsonMethods._
import org.json4s.jackson.JsonMethods._

scala> implicit val formats = DefaultFormats
formats: org.json4s.DefaultFormats.type = org.json4s.DefaultFormats$@16cccda5

scala> val cl = count("*").as("column_name").cast("string") // Used cast last.
cl: org.apache.spark.sql.Column = CAST(count(1) AS `column_name` AS STRING)

scala> (parse(cl.expr.toJSON) \\ "name").extract[String]
res104: String = column_name

推荐阅读