java - 如何在 jar 中使用主类进行火花提交?
问题描述
有很多问题,ClassNotFoundException
但我还没有看到任何适合这个特定案例的问题。我正在尝试运行以下命令:
spark-submit --master local[*] --class com.stronghold.HelloWorld scala-ts.jar
它抛出以下异常:
\u@\h:\w$ spark_submit --class com.stronghold.HelloWorld scala-ts.jar ⬡ 9.8.0 [±master ●●●]
2018-05-06 19:52:33 WARN Utils:66 - Your hostname, asusTax resolves to a loopback address: 127.0.1.1; using 192.168.1.184 instead (on interface p1p1)
2018-05-06 19:52:33 WARN Utils:66 - Set SPARK_LOCAL_IP if you need to bind to another address
2018-05-06 19:52:33 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
java.lang.ClassNotFoundException: com.stronghold.HelloWorld
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:235)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:836)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
2018-05-06 19:52:34 INFO ShutdownHookManager:54 - Shutdown hook called
2018-05-06 19:52:34 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-e8a77988-d30c-4e96-81fe-bcaf5d565c75
但是,jar 显然包含这个类:
1 " zip.vim version v28
1 " Browsing zipfile /home/[USER]/projects/scala_ts/out/artifacts/TimeSeriesFilter_jar/scala-ts.jar
2 " Select a file with cursor and press ENTER
3
4 META-INF/MANIFEST.MF
5 com/
6 com/stronghold/
7 com/stronghold/HelloWorld$.class
8 com/stronghold/TimeSeriesFilter$.class
9 com/stronghold/DataSource.class
10 com/stronghold/TimeSeriesFilter.class
11 com/stronghold/HelloWorld.class
12 com/stronghold/scratch.sc
13 com/stronghold/HelloWorld$delayedInit$body.class
通常,这里的挂断是在文件结构上,但我很确定这里是正确的:
../
scala_ts/
| .git/
| .idea/
| out/
| | artifacts/
| | | TimeSeriesFilter_jar/
| | | | scala-ts.jar
| src/
| | main/
| | | scala/
| | | | com/
| | | | | stronghold/
| | | | | | DataSource.scala
| | | | | | HelloWorld.scala
| | | | | | TimeSeriesFilter.scala
| | | | | | scratch.sc
| | test/
| | | scala/
| | | | com/
| | | | | stronghold/
| | | | | | AppTest.scala
| | | | | | MySpec.scala
| target/
| README.md
| pom.xml
我在工作中运行了具有相同结构的其他工作(因此,不同的环境)。我现在正试图通过家庭项目获得更多便利,但这似乎是一个早期的挂断。
简而言之,我只是错过了一些明显的东西吗?
附录
对于那些感兴趣的人,这是我的pom:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.stronghold</groupId>
<artifactId>scala-ts</artifactId>
<version>1.0-SNAPSHOT</version>
<inceptionYear>2008</inceptionYear>
<properties>
<scala.version>2.11.8</scala.version>
</properties>
<repositories>
<repository>
<id>scala-tools.org</id>
<name>Scala-Tools Maven2 Repository</name>
<url>http://scala-tools.org/repo-releases</url>
</repository>
</repositories>
<pluginRepositories>
<pluginRepository>
<id>scala-tools.org</id>
<name>Scala-Tools Maven2 Repository</name>
<url>http://scala-tools.org/repo-releases</url>
</pluginRepository>
</pluginRepositories>
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.11.8</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.9</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.scala-tools.testing</groupId>
<artifactId>specs_2.10</artifactId>
<version>1.6.9</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.2.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.2.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-catalyst_2.11</artifactId>
<version>2.2.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.3</version>
</dependency>
</dependencies>
<build>
<sourceDirectory>src/main/scala</sourceDirectory>
<testSourceDirectory>src/test/scala</testSourceDirectory>
<plugins>
<plugin>
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
<configuration>
<scalaVersion>${scala.version}</scalaVersion>
<args>
<arg>-target:jvm-1.5</arg>
</args>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-eclipse-plugin</artifactId>
<configuration>
<downloadSources>true</downloadSources>
<buildcommands>
<buildcommand>ch.epfl.lamp.sdt.core.scalabuilder</buildcommand>
</buildcommands>
<additionalProjectnatures>
<projectnature>ch.epfl.lamp.sdt.core.scalanature</projectnature>
</additionalProjectnatures>
<classpathContainers>
<classpathContainer>org.eclipse.jdt.launching.JRE_CONTAINER</classpathContainer>
<classpathContainer>ch.epfl.lamp.sdt.launching.SCALA_CONTAINER</classpathContainer>
</classpathContainers>
</configuration>
</plugin>
</plugins>
</build>
<reporting>
<plugins>
<plugin>
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
<configuration>
<scalaVersion>${scala.version}</scalaVersion>
</configuration>
</plugin>
</plugins>
</reporting>
</project>
更新
对缺乏明确性表示歉意。.jar
我从与( )相同的目录中运行命令/home/[USER]/projects/scala_ts/out/artifacts/TimeSeriesFilter_jar/
。也就是说,为了清楚起见,指定完整路径不会改变结果。
还应该注意的是,我可以在 Intellij 中运行 HelloWorld,它使用相同的类引用 ( com.stronghold.HelloWorld
)。
解决方案
为什么不使用 jar 文件的路径以便spark-submit
(与任何其他命令行工具一样)可以找到并使用它?
鉴于out/artifacts/TimeSeriesFilter_jar/scala-ts.jar
我将使用以下路径:
spark-submit --class com.stronghold.HelloWorld out/artifacts/TimeSeriesFilter_jar/scala-ts.jar
请注意,您应该位于项目的主目录中,该目录似乎是/home/[USER]/projects/scala_ts
.
另请注意,我已删除--master local[*]
,因为这是默认的主 URLspark-submit
使用。
推荐阅读
- c++ - 如何在 Eigen3 中重塑张量?
- c++ - 我使用矢量和文件阅读器,目标是它应该从文本文件中打印信息,如姓名和姓氏
- c++ - 如何最好地初始化和存储常量对象?
- arrays - 在 Twig 中:检查数组元素是否存在。如果不设置这一元素
- r - 使用部分匹配的字符串合并两个 data.frame
- c++ - 从函数获取 char* 后的条件 gdb 断点
- python - 在熊猫数据框字符串列中的第 n 个换行符后丢弃字符串
- command - 如何使我的自定义命令对 linux 中的所有用户可用
- c# - 为什么 Azure Active Directory 身份验证会导致我的 ASP.NET Core 应用程序中出现字符编码错误?
- salesforce - 具有自定义字段和标准字段的 Salesforce 自定义对象