scala - 无法将 spark DF 写为 parquert 文件错误:java.lang.NoSuchMethodError: org.json4s.jackson.JsonMethods
问题描述
无法将 spark DF 写入 parquet 文件。
错误:java.lang.NoSuchMethodError: org.json4s.jackson.JsonMethods$.render$default$2(Lorg/json4s/JsonAST$JValue;)Lorg/json4s/Formats;
- 火花版本 2.4.0
- 斯卡拉版本:2.11.8
代码:
val deltaTableInput1=spark.read.format("com.databricks.spark.csv").option("header","true").option("delimiter","|").option("inferSchema","true").load("path") // this is fine
deltaTableInput1.write.mode("overwrite").format("parquet").save("path")
错误:
java.lang.NoSuchMethodError: org.json4s.jackson.JsonMethods$.render$default$2(Lorg/json4s/JsonAST$JValue;)Lorg/json4s/Formats;
at org.apache.spark.sql.types.DataType.json(DataType.scala:67)
at org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport$.setSchema(ParquetWriteSupport.scala:445)
at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.prepareWrite(ParquetFileFormat.scala:111)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:103)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:159)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:668)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:276)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:270)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:228)
... 49 elided
解决方案
我怀疑可能是由于数据类型
从 csv 格式文件读取数据时,请禁用 Inferschema(设置为 false),而是使用 .schema 方法将自定义模式与您预期的数据类型一起使用。然后以 parquet 格式编写 DF。
请尝试一次并告诉我
推荐阅读
- java - 如何在太多类之间传递值?
- python - 如何根据熊猫中特定行的值创建新列
- keras - 如何处理 ImageDataGenerator 生成的数据?
- asp.net - IIS 显示 iisstart 页面而不是我的应用程序页面
- osgi - 如果安装了 Bundle 的多个版本,如何确定使用的是哪个版本?
- javascript - 如何获取 Javascript 中只有零的浮点数的小数位?
- python - 将 Tornado 应用程序部署到 AWS Lambda 时出错
- html -
- yacc - 这个 YACC 代码是如何产生 shift/reduce 冲突的?(很简单)
- java - Tomcat:java.sql.SQLException:无法从 ClassLoader 加载类:com.mysql.jdbc.Driver