首页 > 解决方案 > 如何将路径列表传递给 spark.read.load?

问题描述

我可以通过将多个路径传递给方法来一次加载多个文件load,例如

spark.read
  .format("com.databricks.spark.avro")
  .load(
    "/data/src/entity1/2018-01-01",
    "/data/src/entity1/2018-01-12",
    "/data/src/entity1/2018-01-14")

我想先准备一个路径列表并将它们传递给load方法,但是我收到以下编译错误:

val paths = Seq(
  "/data/src/entity1/2018-01-01",
  "/data/src/entity1/2018-01-12",
  "/data/src/entity1/2018-01-14")
spark.read.format("com.databricks.spark.avro").load(paths)

<console>:29: error: overloaded method value load with alternatives:
  (paths: String*)org.apache.spark.sql.DataFrame <and>
  (path: String)org.apache.spark.sql.DataFrame
 cannot be applied to (List[String])spark.read.format("com.databricks.spark.avro").load(paths)

为什么?如何将路径列表传递给load方法?

标签: scalaapache-sparkapache-spark-sql

解决方案


您只需要一个splat运算符 ( _*)paths列表为

spark.read.format("com.databricks.spark.avro").load(paths: _*)

推荐阅读