首页 > 解决方案 > 如何插入覆盖 Hive 表而不会因 org.apache.spark.sql.AnalysisException 失败:只能将数据写入具有单个路径的关系。?

问题描述

我有一个 Hive 表,我想使用 Insert Overwrite 覆盖它,下面的示例查询

spark.sql("INSERT OVERWRITE TABLE my_database.my_table VALUES (221221, 'DUMMY_Record_Pav', 21233, 'SPACE')")
--Show create Table
CREATE TABLE `my_database.my_table`(    
  `player_id` string,   
  `player_type` string,     
  `position_id` string,     
  `position_location` string)   
ROW FORMAT SERDE    
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'     
WITH SERDEPROPERTIES (  
  'path'='hdfs://path/hive/data/my_database.db/my_table')   
STORED AS INPUTFORMAT   
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'   
OUTPUTFORMAT    
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'  
LOCATION    
  'hdfs://path/hive/data/my_database.db/my_table''  
TBLPROPERTIES ( 
 --Redacted
 )  

Spark Sql 查询失败并出现以下错误

org.apache.spark.sql.AnalysisException: Can only write data to relations with a single path.;
  at org.apache.spark.sql.execution.datasources.DataSourceAnalysis$$anonfun$apply$1.applyOrElse(DataSourceStrategy.scala:188)
  at org.apache.spark.sql.execution.datasources.DataSourceAnalysis$$anonfun$apply$1.applyOrElse(DataSourceStrategy.scala:134)
  at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsDown$1$$anonfun$2.apply(AnalysisHelper.scala:108)
  at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsDown$1$$anonfun$2.apply(AnalysisHelper.scala:108)
  at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
  at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsDown$1.apply(AnalysisHelper.scala:107)
  at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsDown$1.apply(AnalysisHelper.scala:106)
  at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
  at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.resolveOperatorsDown(AnalysisHelper.scala:106)
  at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDown(LogicalPlan.scala:29)
  at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.resolveOperators(AnalysisHelper.scala:73)
  at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:29)
  at org.apache.spark.sql.execution.datasources.DataSourceAnalysis.apply(DataSourceStrategy.scala:134)
  at org.apache.spark.sql.execution.datasources.DataSourceAnalysis.apply(DataSourceStrategy.scala:52)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:87)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:84)
  at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
  at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
  at scala.collection.mutable.ArrayBuffer.foldLeft(ArrayBuffer.scala:48)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:84)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:76)
  at scala.collection.immutable.List.foreach(List.scala:392)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:76)
  at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:127)
  at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:121)
  at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:106)
  at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:105)
  at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:201)
  at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:105)
  at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
  at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
  at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
  at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:78)
  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:651)
  ... 49 elided

我可以覆盖到没有 SerdeProperties 的类似表,如下所示:

WITH SERDEPROPERTIES (  
  'path'='hdfs://path/hive/data/my_database.db/my_table')   

有没有办法可以删除表的 SerdeProperties?

我尝试将路径设置为'',如下所示,但 Spark Sql 因空路径错误而失败。

ALTER TABLE my_database.my_table SET SERDEPROPERTIES('path'='');

删除 SerdeProperies 将有助于 Spark SQL 查询运行。

标签: scalaapache-sparkhadoophiveapache-spark-sql

解决方案


您可以创建中间表

Create table mytable2 () with ();

insert into mytable2 select * from mytable;

alter table mytable rename to mytable1;
alter table mytable2 rename to mytable;

推荐阅读