首页 > 解决方案 > 使用基于 Maven 构建的 jar-with-dependencies 时如何修复错误 ClassNotFoundException?

问题描述

我正在开发一个 Scala Spark 应用程序来从源数据库中读取数据并将其加载到 bigquery 中,我为它编写了下面的代码。

object Marker extends App {

  val ec = ExecutionContext.fromExecutorService(Executors.newFixedThreadPool(3))  // Spawning 3 threads for 3 active stages.
  override def main(args: Array[String]): Unit = {
    val conf = new SparkConf().set("spark.network.timeout", "12000s").set("spark.kryoSerializer.buffer.max", "512m")
    conf.registerKryoClasses(Array(classOf[IntoBigquery]))

    val spark = SparkSession.builder().appName("app").master("yarn").
      config( "spark.serializer", "org.apache.spark.serializer.KryoSerializer").config(conf).getOrCreate()
    spark.conf.set("temporaryGcsBucket", "bucket_location")
    val ib = new IntoBigquery
    if(ib.read_and_ingest(spark=spark, databasename=args(0), tablename=args(1), partitionColumn=args(2), numPartitions=args(3).toInt, upperBound=args(4).toInt)) println("Ingestion Successful!")
    else println("Ingestion Failed!")
  }
}

我的主类在对象标记中,并且在包下:com.somename

这是我的项目结构:

在此处输入图像描述

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.somename</groupId>
  <artifactId>DeltaLoader</artifactId>
  <version>1.0-SNAPSHOT</version>
  <packaging>jar</packaging>

  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <scala.version>2.12.12</scala.version>
    <maven.compiler.source>3.1</maven.compiler.source>
    <maven.compiler.target>3.1</maven.compiler.target>
  </properties>

  <dependencies>
    <dependency>
      <groupId>org.scala-lang</groupId>
      <artifactId>scala-library</artifactId>
      <version>${scala.version}</version>
    </dependency>

    <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.12</artifactId>
      <version>2.4.7</version>
    </dependency>

    <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-sql_2.12</artifactId>
      <version>2.4.7</version>
    </dependency>

  </dependencies>

  <build>
    <sourceDirectory>src/main/scala</sourceDirectory>
    <testSourceDirectory>src/test/scala</testSourceDirectory>
    <plugins>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-jar-plugin</artifactId>
        <version>2.4</version>
        <configuration>
          <archive>
            <manifest>
              <addClasspath>true</addClasspath>
              <classpathPrefix>lib/</classpathPrefix>
              <mainClass>com.somename.Marker</mainClass>
            </manifest>
          </archive>
        </configuration>
        <executions>
          <execution>
            <id>default-jar</id>
          </execution>
        </executions>
      </plugin>

      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-assembly-plugin</artifactId>
        <version>2.2-beta-5</version>
        <configuration>
          <descriptorRefs>
            <descriptorRef>jar-with-dependencies</descriptorRef>
          </descriptorRefs>
          <archive>
            <manifest>
              <mainClass>com.somename.Marker</mainClass>
            </manifest>
          </archive>
        </configuration>
        <executions>
          <execution>
            <phase>package</phase>
            <goals>
              <goal>single</goal>
            </goals>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>

</project>

当我构建项目时,我在控制台中看到如下警告:

JAR 将为空 - 没有内容被标记为包含!

构建完成后,我看到两个 jar 文件:

  1. 默认jar文件
  2. 一个包含所有依赖项的胖 jar

我以这种方式构建创建 jar:maven从 IntelliJ 的右侧垂直窗格中单击 -> 单击m符号,然后运行mvn package

当我将 jar 复制到我的 gcp 存储桶并在那里提交 jar 文件时,我看到一条错误消息:

21/03/09 11:04:38 WARN org.apache.spark.deploy.SparkSubmit$$anon$2: Failed to load com.somename.Marker.
java.lang.ClassNotFoundException: com.somename.Marker
    at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348

我还尝试使用以下命令从我的 loca sdk (powershell) 提交相同的代码:

gcloud dataproc jobs submit spark --cluster=clustername --region=region_name --jar=gs://bucketlocation/jars/DeltaLoader-1.0-SNAPSHOT-jar-with-dependencies.jar --jars=gs://mssql-jdbc-9.2.0.jre8.jar,gs://spark-bigquery-latest_2.12.jar -- arg1 arg2 arg3 arg4 arg5

面临与云控制台相同的异常:

21/03/09 11:14:05 WARN org.apache.spark.deploy.SparkSubmit$$anon$2: Failed to load com.micron.Marker.
java.lang.ClassNotFoundException: com.micron.Marker

并且还尝试直接从本地提交作业:

spark-submit --master local[2] --deploy-mode client --driver-memory 1g --executor-memory 1g --executor-cores 2 --jars C:\Users\Downloads\mssql-jdbc-9.2.0.jre8.jar,C:\Users\Downloads\spark-bigquery-latest_2.12.jar --class com.somename.Marker C:\Users\IdeaProjects\DeltaLoader\target\DeltaLoader-1.0-SNAPSHOT-jar-with-dependencies.jar arg1 arg2 arg3 arg4 arg5

即使这失败了,同样的例外。我创建 jar 文件的方式或 pom.xml 文件中的任何错误有什么问题吗?谁能让我知道我在哪里犯了错误?任何帮助深表感谢。

标签: scalamavenapache-spark

解决方案


推荐阅读