首页 > 解决方案 > Java 中 Spark 程序中本地 Parquet 文件的 Mac 上的文件路径

问题描述

我有一个 Java 中的小型 Spark 程序,它从 Mac 上的本地目录读取镶木地板文件。我一直在尝试以多种方式做到这一点,但似乎没有任何效果。

Dataset<Row> dsuomcategoryconvfactor = spark.read().parquet(path + "file:///usr/local⁩/ParquetData/data1.parquet");

我认为我为 Spark 识别它提供了错误的路径,并且它引发了以下错误。

20/01/06 10:58:29 INFO SharedState: Warehouse path is 'file:/usr/local/Cellar/apache-spark/2.4.4/libexec/work/driver-20200106105812-0006/spark-warehouse'.
20/01/06 10:58:29 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
Exception in thread "main" java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:65)
    at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: org.apache.spark.sql.AnalysisException: Path does not exist: file:/usr/local⁩/ParquetData/data1.parquet;
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary$1.apply(DataSource.scala:558)
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary$1.apply(DataSource.scala:545)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
    at scala.collection.immutable.List.foreach(List.scala:392)
    at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
    at scala.collection.immutable.List.flatMap(List.scala:355)
    at org.apache.spark.sql.execution.datasources.DataSource.org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary(DataSource.scala:545)
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:359)
    at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
    at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:644)
    at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:628)

从 IDE 运行时这工作正常,但是当我使用 spark-submit 从 shell 提交作业时,会抛出此错误。

任何帮助,将不胜感激。

谢谢!

标签: javaapache-sparkparquet

解决方案


推荐阅读