首页 > 解决方案 > 使用scala spark读取csv并得到错误:线程“main”java.lang.NoClassDefFoundError中的异常:org/apache/spark/sql/SparkSession$

问题描述

我在 Intellij 的 scala 中使用 spark,我使用 POM 导入 spark。现在我想读取一个 csv 文件,如下所示:

import org.apache.log4j.{Level, Logger}
import org.apache.spark.sql.SparkSession

object demo {
  def main(args: Array[String]): Unit = {
    Logger.getLogger("org").setLevel(Level.ERROR)
    val spark: SparkSession = {
      SparkSession
        .builder()
        .master("local")
        .appName("spark pika")
        .getOrCreate()
    }
    val df = spark.read.option("header", "true").option("inferSchema", "true")
      .csv("/Users/siyuxiao/Downloads/churn_dataset_train.csv")
    df.show()
  }
}

但我得到以下信息:

    Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/SparkSession$
    at demo$.main(demo.scala:12)
    at demo.main(demo.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.SparkSession$
    at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

如何解决这个问题并阅读我的 csv 文件?这是我的 POM 依赖项:

<dependencies>
    <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
        <version>2.3.2</version>
    </dependency>

    <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.11</artifactId>
        <version>2.3.2</version>
        <scope>provided</scope>
    </dependency>

    <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-hive -->
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-hive_2.11</artifactId>
        <version>2.3.2</version>
        <scope>provided</scope>
    </dependency>

</dependencies>

如果有人能解决这个问题,我真的很感激。

标签: scalamavenapache-sparkintellij-idea

解决方案


java.lang.NoClassDefFoundError表示 java 在类路径中找不到类。如果您在本地运行它,则必须<scope>provided</scope>从 pom.xml中删除

pom.xml 示例:

<properties>
    <maven.compiler.source>8</maven.compiler.source>
    <maven.compiler.target>8</maven.compiler.target>
    <spark.version>2.4.8</spark.version>
    <scala.version>2.12.8</scala.version>
    <scala.compat.version>2.12</scala.compat.version>
</properties>

<dependencies>
    <dependency>
        <groupId>org.scala-lang</groupId>
        <artifactId>scala-library</artifactId>
        <version>${scala.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_${scala.compat.version}</artifactId>
        <version>${spark.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_${scala.compat.version}</artifactId>
        <version>${spark.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-hive_${scala.compat.version}</artifactId>
        <version>${spark.version}</version>
    </dependency>
</dependencies>

<build>
    <plugins>
        <plugin>
            <groupId>org.scala-tools</groupId>
            <artifactId>maven-scala-plugin</artifactId>
            <version>${scala.compat.version}</version>
            <executions>
                <execution>
                    <goals>
                        <goal>compile</goal>
                        <goal>testCompile</goal>
                    </goals>
                </execution>
            </executions>
        </plugin>
    </plugins>
</build>

推荐阅读