java - 弹簧靴 | 为多个文件提供文件路径时,Spark.read 失败
问题描述
在 Win10 中,在 IntelliJ 中,此路径(“C:/hive/Orders_[0-9]*.csv”)在作为独立 java spark 作业运行时效果很好。但不能作为 Spring Boot 火花工作。似乎 spring boot 没有检测到本机文件系统。不知道如何解决这个问题。
Dataset<Row> DF1 = spark
.read().format("csv")
.option("header", "true")
.option("delimiter", "\t")
.load("C:/hive/Orders_[0-9]*.csv");
错误:
Error starting ApplicationContext. To display the auto-configuration report re-run your application with 'debug' enabled.
2019-09-04 21:59:27.701 ERROR [omni-ods-migration,,,] 8216 --- [ main] o.s.boot.SpringApplication : Application startup failed
org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'odsMigrationService': Invocation of init method failed; nested exception is java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor.postProcessBeforeInitialization(InitDestroyAnnotationBeanPostProcessor.java:137)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.applyBeanPostProcessorsBeforeInitialization(AbstractAutowireCapableBeanFactory.java:409)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1620)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:555)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:483)
at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:306)
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:230)
at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:302)
at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:197)
at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:761)
at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:867)
at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:543)
at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:693)
at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:360)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:303)
at org.springframework.boot.builder.SpringApplicationBuilder.run(SpringApplicationBuilder.java:134)
at com.jcpenney.ods.OdsMigration.main(OdsMigration.java:20)
Caused by: java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method)
at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:645)
at org.apache.hadoop.fs.FileUtil.canRead(FileUtil.java:1230)
at org.apache.hadoop.fs.FileUtil.list(FileUtil.java:1435)
at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:493)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1868)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1910)
at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:678)
at org.apache.hadoop.fs.Globber.listStatus(Globber.java:77)
at org.apache.hadoop.fs.Globber.doGlob(Globber.java:235)
at org.apache.hadoop.fs.Globber.glob(Globber.java:149)
at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:2016)
at org.apache.spark.deploy.SparkHadoopUtil.globPath(SparkHadoopUtil.scala:241)
at org.apache.spark.deploy.SparkHadoopUtil.globPathIfNecessary(SparkHadoopUtil.scala:247)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$12.apply(DataSource.scala:383)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$12.apply(DataSource.scala:379)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:355)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:379)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:132)
at com.jcpenney.ods.service.OdsMigrationService.readHfsFile(OdsMigrationService.java:588)
at com.jcpenney.ods.service.OdsMigrationService.processOrders(OdsMigrationService.java:334)
at com.jcpenney.ods.service.OdsMigrationService.run(OdsMigrationService.java:129)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor$LifecycleElement.invoke(InitDestroyAnnotationBeanPostProcessor.java:366)
at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor$LifecycleMetadata.invokeInitMethods(InitDestroyAnnotationBeanPostProcessor.java:311)
at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor.postProcessBeforeInitialization(InitDestroyAnnotationBeanPostProcessor.java:134)
... 16 common frames omitted
下面的代码在 Spring Boot 中也能很好地工作,当路径以准确的文件名给出时。
Dataset<Row> DF1 = spark
.read().format("csv")
.option("header", "true")
.option("delimiter", "\t")
.load("C:/hive/Orders_000001.csv");
如何解决这个问题?
解决方案
这是一个可能的解决方案
- 从https://github.com/steveloughran/winutils下载适用于 Windows 的 Hadoop 文件
- 提取文件(例如 C:\hadoop)。确保目录结构与此类似
C:\hadoop\bin\winutils.exe
- 将环境变量设置
HADOOP_HOME
为C:\hadoop
- 将 Hadoop 添加到 Path 环境变量:
%HADOOP_HOME%\bin
- 复制
hadoop.dll
到Windows\System32
(可能不需要) - 重启系统
- 特定于 Spring Boot:将其添加到 main 方法中:
System.setProperty ("hadoop.home.dir", "C:/hadoop/" );
System.load ("C:/hadoop/bin/hadoop.dll");
参考:
- https://cwiki.apache.org/confluence/display/HADOOP2/WindowsProblems
- https://sparkbyexamples.com/spark/spark-hadoop-exception-in-thread-main-java-lang-unsatisfiedlinkerror-org-apache-hadoop-io-nativeio-nativeiowindows-access0ljava-lang-stringiz/
- https://blog.csdn.net/weixin_30802273/article/details/96528359
推荐阅读
- python - 无法在 docker 容器中使用 gunicorn 连接到烧瓶应用程序
- angular - 如何制作嵌套数组的拖放树结构
- python - 找到给定函数值的参数的最佳值
- angular - ngb-datepicker 如何获取所选日期的周数
- animation - 如何防止“动画”散景图卡顿
- javascript - 资产图像未显示在平面列表博览会中
- javascript - 将坐标归一化到最近的路线
- python - 在 python 中使用 win32com 加载特定版本的 DLL
- python - 用于 unless 语句的 Python 语法
- python - 如何根据条件将列中的值设置为另一列