scala - ClassNotFoundException:引起:java.lang.ClassNotFoundException:csv.DefaultSource
问题描述
我正在尝试创建一个程序集 jar 可执行文件但收到以下错误
Caused by: java.lang.ClassNotFoundException: csv.DefaultSource
问题在于读取的 CSV 文件。代码在 IDE 中运行良好。请帮我
Scala代码如下
package extendedtable
import org.apache.log4j.{Level, Logger}
import org.apache.spark.SparkContext
import org.apache.spark.sql.{DataFrame, Row, SparkSession}
import scala.collection.mutable.ListBuffer
object mainObject {
// var read = new fileRead
def main(args: Array[String]): Unit = {
val spark: SparkSession = SparkSession.builder().appName("generationobj").master("local[*]").config("spark.sql.crossJoin.enabled", value = true).getOrCreate()
val sc: SparkContext = spark.sparkContext
import spark.implicits._
val atomData = spark.read.format("csv")
.option("header", "true")
.option("inferSchema", "true")
.load("Resources/atom.csv")
val moleculeData = spark.read.format("csv")
.option("header", "true")
.option("inferSchema", "true")
.load("Resources/molecule.csv")
val df = moleculeData.join(atomData,"molecule_id")
val molecule_df = moleculeData
val mid: List[Row] = molecule_df.select("molecule_id").collect.toList
var listofmoleculeid: List[String] = mid.map(r => r.getString(0))
// print(listofmoleculeid)
newDF.createTempView("table")
newDF.show()}
以下是构建文件
name := "ExtendedTable"
version := "0.1"
scalaVersion := "2.11.12"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.3.0"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.3.0"
libraryDependencies += "org.apache.spark" %% "spark-mllib" % "2.3.0"
mainClass := Some("extendedtable.mainObject")
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs @ _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
解决方案
在下面更改您的assemblyMergeStrategy
喜欢然后构建 jar 文件。
您需要将此org.apache.spark.sql.sources.DataSourceRegister
文件包含在您的 jar 文件中,并且此文件将在spark-sql
jar 文件中可用。
路径是 -spark-sql_2.11-<version>.jar /META-INF/services/org.apache.spark.sql.sources.DataSourceRegister
此文件包含以下列表
org.apache.spark.sql.execution.datasources.csv.CSVFileFormat
org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider
org.apache.spark.sql.execution.datasources.json.JsonFileFormat
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat
org.apache.spark.sql.execution.datasources.text.TextFileFormat
org.apache.spark.sql.execution.streaming.ConsoleSinkProvider
org.apache.spark.sql.execution.streaming.TextSocketSourceProvider
org.apache.spark.sql.execution.streaming.RateSourceProvider
assemblyMergeStrategy in assembly := {
case PathList("META-INF","services",xs @ _*) => MergeStrategy.filterDistinctLines // Added this
case PathList("META-INF",xs @ _*) => MergeStrategy.discard
case _ => MergeStrategy.first
}
推荐阅读
- imagemagick - Magick.net 调整 GIF 大小,最终文件更大
- vba - 用户表单关闭时拉取选项按钮值
- keras - 训练和预测时的 Keras 功能智能中心
- c++ - 当我从 cin 读取时,为什么转义字符不起作用?
- php - 动态改变一个div的CSS属性类php
- javascript - React Native - 更新“affectedRows:0”并且删除未定义
- javascript - 网站中的实时流媒体在线编辑器
- scala - 如何在不丢失信息的情况下并行化 spark etl(在文件名中)
- c# - Umbraco 从 7.2.8 更新到 7.12
- excel - 使用 Apache POI 读取 Excel 数据的问题