scala - 使用scala spark读取csv并得到错误:线程“main”java.lang.NoClassDefFoundError中的异常:org/apache/spark/sql/SparkSession$
问题描述
我在 Intellij 的 scala 中使用 spark,我使用 POM 导入 spark。现在我想读取一个 csv 文件,如下所示:
import org.apache.log4j.{Level, Logger}
import org.apache.spark.sql.SparkSession
object demo {
def main(args: Array[String]): Unit = {
Logger.getLogger("org").setLevel(Level.ERROR)
val spark: SparkSession = {
SparkSession
.builder()
.master("local")
.appName("spark pika")
.getOrCreate()
}
val df = spark.read.option("header", "true").option("inferSchema", "true")
.csv("/Users/siyuxiao/Downloads/churn_dataset_train.csv")
df.show()
}
}
但我得到以下信息:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/SparkSession$
at demo$.main(demo.scala:12)
at demo.main(demo.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.SparkSession$
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
如何解决这个问题并阅读我的 csv 文件?这是我的 POM 依赖项:
<dependencies>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.3.2</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.3.2</version>
<scope>provided</scope>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-hive -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>2.3.2</version>
<scope>provided</scope>
</dependency>
</dependencies>
如果有人能解决这个问题,我真的很感激。
解决方案
java.lang.NoClassDefFoundError表示 java 在类路径中找不到类。如果您在本地运行它,则必须<scope>provided</scope>
从 pom.xml中删除
pom.xml 示例:
<properties>
<maven.compiler.source>8</maven.compiler.source>
<maven.compiler.target>8</maven.compiler.target>
<spark.version>2.4.8</spark.version>
<scala.version>2.12.8</scala.version>
<scala.compat.version>2.12</scala.compat.version>
</properties>
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.compat.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.compat.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_${scala.compat.version}</artifactId>
<version>${spark.version}</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
<version>${scala.compat.version}</version>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
推荐阅读
- python - 如何在 Apache Beam / Google Cloud DataFlow 中通过多个 ParDo 转换处理本地文件的操作
- python - Python - 使用 pytesseract 将图像转换为文本时出错
- mysql - Why i can't insert into join view
- python - Python - 将txt文件读入列表 - 显示新列表的内容
- java - 如何使用 Optional orElseThrow
- swift - Xcode - 在 NSGridView 单元格中添加多个视图
- python - 如何使用多线程在 Python 中快速下载 1000+ .txt 文件
- javascript - 滑动按钮文本
- reactjs - React-Hook-Form 在 PC 上工作,但在移动输入上切换字段时变为空白
- python - 在 PySimpleGUI 中使用元组作为键?