scala - 使用基于 Maven 构建的 jar-with-dependencies 时如何修复错误 ClassNotFoundException?
问题描述
我正在开发一个 Scala Spark 应用程序来从源数据库中读取数据并将其加载到 bigquery 中,我为它编写了下面的代码。
object Marker extends App {
val ec = ExecutionContext.fromExecutorService(Executors.newFixedThreadPool(3)) // Spawning 3 threads for 3 active stages.
override def main(args: Array[String]): Unit = {
val conf = new SparkConf().set("spark.network.timeout", "12000s").set("spark.kryoSerializer.buffer.max", "512m")
conf.registerKryoClasses(Array(classOf[IntoBigquery]))
val spark = SparkSession.builder().appName("app").master("yarn").
config( "spark.serializer", "org.apache.spark.serializer.KryoSerializer").config(conf).getOrCreate()
spark.conf.set("temporaryGcsBucket", "bucket_location")
val ib = new IntoBigquery
if(ib.read_and_ingest(spark=spark, databasename=args(0), tablename=args(1), partitionColumn=args(2), numPartitions=args(3).toInt, upperBound=args(4).toInt)) println("Ingestion Successful!")
else println("Ingestion Failed!")
}
}
我的主类在对象标记中,并且在包下:com.somename
这是我的项目结构:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.somename</groupId>
<artifactId>DeltaLoader</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>jar</packaging>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<scala.version>2.12.12</scala.version>
<maven.compiler.source>3.1</maven.compiler.source>
<maven.compiler.target>3.1</maven.compiler.target>
</properties>
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>2.4.7</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>2.4.7</version>
</dependency>
</dependencies>
<build>
<sourceDirectory>src/main/scala</sourceDirectory>
<testSourceDirectory>src/test/scala</testSourceDirectory>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<version>2.4</version>
<configuration>
<archive>
<manifest>
<addClasspath>true</addClasspath>
<classpathPrefix>lib/</classpathPrefix>
<mainClass>com.somename.Marker</mainClass>
</manifest>
</archive>
</configuration>
<executions>
<execution>
<id>default-jar</id>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.2-beta-5</version>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
<archive>
<manifest>
<mainClass>com.somename.Marker</mainClass>
</manifest>
</archive>
</configuration>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
当我构建项目时,我在控制台中看到如下警告:
JAR 将为空 - 没有内容被标记为包含!
构建完成后,我看到两个 jar 文件:
- 默认jar文件
- 一个包含所有依赖项的胖 jar
我以这种方式构建创建 jar:maven
从 IntelliJ 的右侧垂直窗格中单击 -> 单击m
符号,然后运行mvn package
当我将 jar 复制到我的 gcp 存储桶并在那里提交 jar 文件时,我看到一条错误消息:
21/03/09 11:04:38 WARN org.apache.spark.deploy.SparkSubmit$$anon$2: Failed to load com.somename.Marker.
java.lang.ClassNotFoundException: com.somename.Marker
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348
我还尝试使用以下命令从我的 loca sdk (powershell) 提交相同的代码:
gcloud dataproc jobs submit spark --cluster=clustername --region=region_name --jar=gs://bucketlocation/jars/DeltaLoader-1.0-SNAPSHOT-jar-with-dependencies.jar --jars=gs://mssql-jdbc-9.2.0.jre8.jar,gs://spark-bigquery-latest_2.12.jar -- arg1 arg2 arg3 arg4 arg5
面临与云控制台相同的异常:
21/03/09 11:14:05 WARN org.apache.spark.deploy.SparkSubmit$$anon$2: Failed to load com.micron.Marker.
java.lang.ClassNotFoundException: com.micron.Marker
并且还尝试直接从本地提交作业:
spark-submit --master local[2] --deploy-mode client --driver-memory 1g --executor-memory 1g --executor-cores 2 --jars C:\Users\Downloads\mssql-jdbc-9.2.0.jre8.jar,C:\Users\Downloads\spark-bigquery-latest_2.12.jar --class com.somename.Marker C:\Users\IdeaProjects\DeltaLoader\target\DeltaLoader-1.0-SNAPSHOT-jar-with-dependencies.jar arg1 arg2 arg3 arg4 arg5
即使这失败了,同样的例外。我创建 jar 文件的方式或 pom.xml 文件中的任何错误有什么问题吗?谁能让我知道我在哪里犯了错误?任何帮助深表感谢。
解决方案
推荐阅读
- continuous-integration - 使用 stylelint 检查供应商前缀是否存在
- airflow - 在没有root访问权限的linux中安装apache-airflow - 找不到气流命令
- android - FLAG_LAYOUT_NO_LIMITS 将 windowDrawsSystemBarBackgrounds 设置为 true 导致我的导航栏在 Android 模拟器中变黑(
- angular - 具有不同参数和相同组件的角度路由多个路径
- jenkins - 在 Jenkins 声明性管道中复制工件时如何处理可能不存在的分支?
- html-parsing - tika 解析器中标记元素文本中附加的额外空格
- google-sheets - 根据日期从其他工作表中获取数据
- python - Pandas:根据新值编辑索引值并重新分组
- ios - 如何根据 UILabel 动态调整 CollectionViewCell 的大小
- xml - OutputFormatter XmlDataContractSerializerOutputFormatter asp.net 核心未达到子级的子级