apache-spark - 我自己建立了一个火花簇。当我在 s3 上读取 parquet 文件时,出现错误:IllegalAccessError
问题描述
错误:
Exception in thread "main" java.lang.IllegalAccessError: tried to access method org.apache.hadoop.metrics2.lib.MutableCounterLong.<init>(Lorg/apache/hadoop/metrics2/MetricsInfo;J)V from class org.apache.hadoop.fs.s3a.S3AInstrumentation
at org.apache.hadoop.fs.s3a.S3AInstrumentation.streamCounter(S3AInstrumentation.java:194)
at org.apache.hadoop.fs.s3a.S3AInstrumentation.streamCounter(S3AInstrumentation.java:216)
at org.apache.hadoop.fs.s3a.S3AInstrumentation.<init>(S3AInstrumentation.java:139)
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:174)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:44)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:354)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:620)
at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:604)
at net.appcloudbox.autopilot.eventstats.MiningTask$$anonfun$clean_data$2.apply(MiningTask.scala:141)
at net.appcloudbox.autopilot.eventstats.MiningTask$$anonfun$clean_data$2.apply(MiningTask.scala:140)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at net.appcloudbox.autopilot.eventstats.MiningTask.clean_data(MiningTask.scala:140)
at net.appcloudbox.autopilot.eventstats.MiningTask.run(MiningTask.scala:35)
at net.appcloudbox.autopilot.eventstats.EventStats$.main(EventStats.scala:39)
at net.appcloudbox.autopilot.eventstats.EventStats.main(EventStats.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:879)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
有人说是因为hadoop版本问题。一开始,我用的是hadoop-2.7.5.tar.gz和spark-2.3.0-bin-hadoop2.7.tgz,我的工作也遇到了上面的问题.当我使用hadoop-2.8.5.tar.gz和spark-2.3.0-bin-hadoop2.7.tgz时,我的工作又遇到了同样的问题。我的代码如下:</p>
spark.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", config.get("aws_access_key_id"))
spark.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", config.get("aws_secret_access_key"))
spark.sparkContext.hadoopConfiguration.set("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
......
spark.read.parquet("s3a://bucket/...../sample.parquet").rdd
解决方案
我解决了这个问题。如你所见,我使用的是hadoop-2.8.5.tar.gz和spark-2.3.0-bin-hadoop2.7.tgz,在spark安装目录jars中,是hadoop2.7的jars .x,所以,你只需要将它们替换为 2.8.5 版本。
推荐阅读
- vue.js - 移动设备上的 Nuxt 静态生成不匹配
- javascript - 如何将 Kafkajs 与 Socket.io 连接
- arrays - Dafny 中的高阶计数 (P)
- sql - 如何将 HTML 内容播种到 SQL?
- google-sheets - 如何根据一天自动更改单元格值
- java - 如何让动画组件在动画后“弹回”到其原始位置?
- snowflake-cloud-data-platform - Snowflake - 我想创建一个从 S3 加载的文件名的日志表
- tensorflow - 从图像中提取曲线
- c# - c#中更新数据网格视图后如何更新数据源
- shell - 显示被忽略的隐藏文件和非隐藏文件的脚本问题(shell)