首页 > 解决方案 > 如何在 jar 中使用主类进行火花提交?

问题描述

有很多问题,ClassNotFoundException但我还没有看到任何适合这个特定案例的问题。我正在尝试运行以下命令:

spark-submit --master local[*] --class com.stronghold.HelloWorld scala-ts.jar

它抛出以下异常:

\u@\h:\w$ spark_submit --class com.stronghold.HelloWorld scala-ts.jar                                                                                                                                                                                                                                                                               ⬡ 9.8.0 [±master ●●●] 
2018-05-06 19:52:33 WARN  Utils:66 - Your hostname, asusTax resolves to a loopback address: 127.0.1.1; using 192.168.1.184 instead (on interface p1p1)                               
2018-05-06 19:52:33 WARN  Utils:66 - Set SPARK_LOCAL_IP if you need to bind to another address                                                                                       
2018-05-06 19:52:33 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable                                
java.lang.ClassNotFoundException: com.stronghold.HelloWorld                               
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)                     
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)                          
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)                          
        at java.lang.Class.forName0(Native Method)                                        
        at java.lang.Class.forName(Class.java:348)                                        
        at org.apache.spark.util.Utils$.classForName(Utils.scala:235)                     
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:836)                                                                  
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)        
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)             
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)               
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)                    
2018-05-06 19:52:34 INFO  ShutdownHookManager:54 - Shutdown hook called                   
2018-05-06 19:52:34 INFO  ShutdownHookManager:54 - Deleting directory /tmp/spark-e8a77988-d30c-4e96-81fe-bcaf5d565c75

但是,jar 显然包含这个类:

1     " zip.vim version v28                                                                                                                                                                                                                                                                                                                                               
    1 " Browsing zipfile /home/[USER]/projects/scala_ts/out/artifacts/TimeSeriesFilter_jar/scala-ts.jar
    2 " Select a file with cursor and press ENTER
    3  
    4 META-INF/MANIFEST.MF
    5 com/
    6 com/stronghold/
    7 com/stronghold/HelloWorld$.class
    8 com/stronghold/TimeSeriesFilter$.class
    9 com/stronghold/DataSource.class
   10 com/stronghold/TimeSeriesFilter.class
   11 com/stronghold/HelloWorld.class
   12 com/stronghold/scratch.sc
   13 com/stronghold/HelloWorld$delayedInit$body.class

通常,这里的挂断是在文件结构上,但我很确定这里是正确的:

../
scala_ts/
| .git/
| .idea/
| out/
| | artifacts/
| | | TimeSeriesFilter_jar/
| | | | scala-ts.jar
| src/
| | main/
| | | scala/
| | | | com/
| | | | | stronghold/
| | | | | | DataSource.scala
| | | | | | HelloWorld.scala
| | | | | | TimeSeriesFilter.scala
| | | | | | scratch.sc
| | test/
| | | scala/
| | | | com/
| | | | | stronghold/
| | | | | | AppTest.scala
| | | | | | MySpec.scala                                                                                                                                                                                                                                                                                                                                                  
| target/
| README.md
| pom.xml

我在工作中运行了具有相同结构的其他工作(因此,不同的环境)。我现在正试图通过家庭项目获得更多便利,但这似乎是一个早期的挂断。

简而言之,我只是错过了一些明显的东西吗?

附录

对于那些感兴趣的人,这是我的pom:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.stronghold</groupId>
  <artifactId>scala-ts</artifactId>
  <version>1.0-SNAPSHOT</version>
  <inceptionYear>2008</inceptionYear>
  <properties>
    <scala.version>2.11.8</scala.version>
  </properties>

  <repositories>
    <repository>
      <id>scala-tools.org</id>
      <name>Scala-Tools Maven2 Repository</name>
      <url>http://scala-tools.org/repo-releases</url>
    </repository>
  </repositories>

  <pluginRepositories>
    <pluginRepository>
      <id>scala-tools.org</id>
      <name>Scala-Tools Maven2 Repository</name>
      <url>http://scala-tools.org/repo-releases</url>
    </pluginRepository>
  </pluginRepositories>

  <dependencies>
    <dependency>
      <groupId>org.scala-lang</groupId>
      <artifactId>scala-library</artifactId>
      <version>2.11.8</version>
    </dependency>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>4.9</version>
      <scope>test</scope>
    </dependency>
    <dependency>
      <groupId>org.scala-tools.testing</groupId>
      <artifactId>specs_2.10</artifactId>
      <version>1.6.9</version>
      <scope>test</scope>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.11</artifactId>
      <version>2.2.0</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-sql_2.11</artifactId>
      <version>2.2.0</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-catalyst_2.11</artifactId>
      <version>2.2.0</version>
    </dependency>
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-common</artifactId>
      <version>2.7.3</version>
    </dependency>
  </dependencies>

  <build>
    <sourceDirectory>src/main/scala</sourceDirectory>
    <testSourceDirectory>src/test/scala</testSourceDirectory>
    <plugins>
      <plugin>
        <groupId>org.scala-tools</groupId>
        <artifactId>maven-scala-plugin</artifactId>
        <executions>
          <execution>
            <goals>
              <goal>compile</goal>
              <goal>testCompile</goal>
            </goals>
          </execution>
        </executions>
        <configuration>
          <scalaVersion>${scala.version}</scalaVersion>
          <args>
            <arg>-target:jvm-1.5</arg>
          </args>
        </configuration>
      </plugin>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-eclipse-plugin</artifactId>
        <configuration>
          <downloadSources>true</downloadSources>
          <buildcommands>
            <buildcommand>ch.epfl.lamp.sdt.core.scalabuilder</buildcommand>
          </buildcommands>
          <additionalProjectnatures>
            <projectnature>ch.epfl.lamp.sdt.core.scalanature</projectnature>
          </additionalProjectnatures>
          <classpathContainers>
            <classpathContainer>org.eclipse.jdt.launching.JRE_CONTAINER</classpathContainer>
            <classpathContainer>ch.epfl.lamp.sdt.launching.SCALA_CONTAINER</classpathContainer>
          </classpathContainers>
        </configuration>
      </plugin>
    </plugins>
  </build>
  <reporting>
    <plugins>
      <plugin>
        <groupId>org.scala-tools</groupId>
        <artifactId>maven-scala-plugin</artifactId>
        <configuration>
          <scalaVersion>${scala.version}</scalaVersion>
        </configuration>
      </plugin>
    </plugins>
  </reporting>
</project>

更新

对缺乏明确性表示歉意。.jar我从与( )相同的目录中运行命令/home/[USER]/projects/scala_ts/out/artifacts/TimeSeriesFilter_jar/。也就是说,为了清楚起见,指定完整路径不会改变结果。

还应该注意的是,我可以在 Intellij 中运行 HelloWorld,它使用相同的类引用 ( com.stronghold.HelloWorld)。

标签: javascalaapache-spark

解决方案


为什么不使用 jar 文件的路径以便spark-submit(与任何其他命令行工具一样)可以找到并使用它?

鉴于out/artifacts/TimeSeriesFilter_jar/scala-ts.jar我将使用以下路径:

spark-submit --class com.stronghold.HelloWorld out/artifacts/TimeSeriesFilter_jar/scala-ts.jar

请注意,您应该位于项目的主目录中,该目录似乎是/home/[USER]/projects/scala_ts.

另请注意,我已删除--master local[*],因为这是默认的主 URLspark-submit使用。


推荐阅读