scala - 如何插入覆盖 Hive 表而不会因 org.apache.spark.sql.AnalysisException 失败:只能将数据写入具有单个路径的关系。?
问题描述
我有一个 Hive 表,我想使用 Insert Overwrite 覆盖它,下面的示例查询
spark.sql("INSERT OVERWRITE TABLE my_database.my_table VALUES (221221, 'DUMMY_Record_Pav', 21233, 'SPACE')")
--Show create Table
CREATE TABLE `my_database.my_table`(
`player_id` string,
`player_type` string,
`position_id` string,
`position_location` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES (
'path'='hdfs://path/hive/data/my_database.db/my_table')
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
'hdfs://path/hive/data/my_database.db/my_table''
TBLPROPERTIES (
--Redacted
)
Spark Sql 查询失败并出现以下错误
org.apache.spark.sql.AnalysisException: Can only write data to relations with a single path.;
at org.apache.spark.sql.execution.datasources.DataSourceAnalysis$$anonfun$apply$1.applyOrElse(DataSourceStrategy.scala:188)
at org.apache.spark.sql.execution.datasources.DataSourceAnalysis$$anonfun$apply$1.applyOrElse(DataSourceStrategy.scala:134)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsDown$1$$anonfun$2.apply(AnalysisHelper.scala:108)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsDown$1$$anonfun$2.apply(AnalysisHelper.scala:108)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsDown$1.apply(AnalysisHelper.scala:107)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsDown$1.apply(AnalysisHelper.scala:106)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.resolveOperatorsDown(AnalysisHelper.scala:106)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDown(LogicalPlan.scala:29)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.resolveOperators(AnalysisHelper.scala:73)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:29)
at org.apache.spark.sql.execution.datasources.DataSourceAnalysis.apply(DataSourceStrategy.scala:134)
at org.apache.spark.sql.execution.datasources.DataSourceAnalysis.apply(DataSourceStrategy.scala:52)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:87)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:84)
at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
at scala.collection.mutable.ArrayBuffer.foldLeft(ArrayBuffer.scala:48)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:84)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:76)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:76)
at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:127)
at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:121)
at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:106)
at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:105)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:201)
at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:105)
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:78)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:651)
... 49 elided
我可以覆盖到没有 SerdeProperties 的类似表,如下所示:
WITH SERDEPROPERTIES (
'path'='hdfs://path/hive/data/my_database.db/my_table')
有没有办法可以删除表的 SerdeProperties?
我尝试将路径设置为'',如下所示,但 Spark Sql 因空路径错误而失败。
ALTER TABLE my_database.my_table SET SERDEPROPERTIES('path'='');
删除 SerdeProperies 将有助于 Spark SQL 查询运行。
解决方案
您可以创建中间表
Create table mytable2 () with ();
insert into mytable2 select * from mytable;
alter table mytable rename to mytable1;
alter table mytable2 rename to mytable;
推荐阅读
- python - Difference in genfromtxt between Anaconda and Canopy
- python - pipenv:packaging.specifiers.InvalidSpecifier:无效的说明符
- android - 锁定屏幕方向时使用 setRequestedOrientation 时系统 UI 崩溃
- html - 将和标签放入
标签?
- android - 何设置 MPAndroid 饼图无标签
- ada - 如何在 Ada 的嵌套聚合中分配记录
- karate - KARATE - 如何从响应中获取数据/通过运行另一个功能文件
- saml - 如何将 Auth0 作为 IdP 添加到 ADFS?
- docker - docker stack 中的网络未定义问题
- php - Remote debugging server 2008, PHP, IIS, Xdebug