首页 > 解决方案 > 在 Spark Scala 中找不到错误值

问题描述

架构:

root
 |-- col_a: struct (nullable = true)
 |    |-- $numberLong: string (nullable = true)
 |-- col_b: string (nullable = true)
 |-- col_c: struct (nullable = true)
 |    |-- $numberLong: string (nullable = true)

打破 (col_a) 结构的代码

df = df.select($"col_a.*",$"col_b",$"col_c")
df.printSchema()

操作:

|-- $numberLong: string (nullable = true)
|-- col_b: string (nullable = true)
|-- col_c: struct (nullable = true)
|    |-- $numberLong: string (nullable = true)

现在,当我尝试仅选择第一列(“$numberLong”)并重命名它时

df = df.select($"$numberLong".as("test"))

我收到以下错误:

error: not found: value numberLong
df = df.select($"$numberLong")
                  ^

当该列明显存在时,我无法理解错误的原因。

标签: scalaapache-spark

解决方案


如果列的列名中有前导$,则不能引用该列$"colName"- 即使您将 colName 括在backticks. 而是col("colName")如下所示使用:

case class A(`$numberLong`: String)

val df = Seq(
  (A("x1"), "d1", A("y1")),
  (A("x2"), "d2", A("y2")),
  (A("x3"), "d3", A("y3"))
).toDF("col_a", "col_b", "col_c")

val df2 = df.select($"col_a.*", $"col_b", $"col_c")

df2.printSchema
// root
//  |-- $numberLong: string (nullable = true)
//  |-- col_b: string (nullable = true)
//  |-- col_c: struct (nullable = true)
//  |    |-- $numberLong: string (nullable = true)

df2.select(col("$numberLong").as("test")).printSchema
// root
//  |-- test: string (nullable = true)

推荐阅读