首页 > 解决方案 > 在 Spark SQL 中使用地图数据类型查询配置单元表时出错。但是在 HiveQL 中执行时工作

问题描述

我有以下结构的蜂巢表

+---------------+--------------+----------------------+
| column_value  | metric_name  |         key          |
+---------------+--------------+----------------------+
| A37B          | Mean         | {0:"202006",1:"1"}  |
| ACCOUNT_ID    | Mean         | {0:"202006",1:"2"}  |
| ANB_200       | Mean         | {0:"202006",1:"3"}  |
| ANB_201       | Mean         | {0:"202006",1:"4"}  |
| AS82_RE       | Mean         | {0:"202006",1:"5"}  |
| ATTR001       | Mean         | {0:"202007",1:"2"}  |
| ATTR001_RE    | Mean         | {0:"202007",1:"3"}  |
| ATTR002       | Mean         | {0:"202007",1:"4"}  |
| ATTR002_RE    | Mean         | {0:"202007",1:"5"}  |
| ATTR003       | Mean         | {0:"202008",1:"3"}  |
| ATTR004       | Mean         | {0:"202008",1:"4"}  |
| ATTR005       | Mean         | {0:"202008",1:"5"}  |
| ATTR006       | Mean         | {0:"202009",1:"4"}  |
| ATTR006       | Mean         | {0:"202009",1:"5"}  |

我需要编写一个 spark sql 查询来基于具有 NOT IN 条件的 Key 列以及两个键的组合进行过滤。

以下查询在 Beeline 的 HiveQL 中运行良好

select * from your_data where key[0] between  '202006' and '202009' and key NOT IN ( map(0,"202009",1,"5") );

但是当我在 Spark SQL 中尝试相同的查询时。我收到错误

由于数据类型不匹配而无法解析:在 org.apache.spark.sql.catalyst 的 org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) 的 map<int,string>。 analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$3.applyOrElse(CheckAnalysis.scala:115) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$3 .applyOrElse(CheckAnalysis.scala:107) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:27​​8) at org.apache.spark.sql.catalyst.trees .TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:27​​8) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) at org.apache.spark.sql。 org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3 上的催化剂.trees.TreeNode.transformUp(TreeNode.scala:27​​7)。在 org.apache.spark.sql.catalyst.trees.TreeNode$$$anonfun$3.apply(TreeNode.scala:27​​5) 在 org.apache.spark.sql.catalyst.trees.TreeNode$ 处应用(TreeNode.scala:27​​5) $anonfun$4.apply(TreeNode.scala:326) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) at org.apache.spark.sql.catalyst.trees.TreeNode。 mapChildren(TreeNode.scala:324) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:27​​5) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3。在 org.apache.spark.sql.catalyst.trees.TreeNode$$$anonfun$3.apply(TreeNode.scala:27​​5) 在 org.apache.spark.sql.catalyst.trees.TreeNode$ 处应用(TreeNode.scala:27​​5) $anonfun$4.apply(TreeNode.scala:326)在 org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren (TreeNode. scala:324) 在 org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:27​​5) 在 org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode. scala:27​​5) 在 org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:27​​5) 在 org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4。应用(TreeNode.scala:326)在 org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode. scala:324) 在 org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:27​​5) 在 org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode. scala:27​​5) 在 org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:27​​5) 在 org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4。应用(TreeNode.scala:326)在 org.apache.spark.sql.catalyst.trees.TreeNode$$$anonfun$3.apply(TreeNode.scala:27​​5) 在 org.apache.spark.sql.catalyst.trees.TreeNode$ 处应用(TreeNode.scala:27​​5) $anonfun$4.apply(TreeNode.scala:326)在 org.apache.spark.sql.catalyst.trees.TreeNode$$$anonfun$3.apply(TreeNode.scala:27​​5) 在 org.apache.spark.sql.catalyst.trees.TreeNode$ 处应用(TreeNode.scala:27​​5) $anonfun$4.apply(TreeNode.scala:326)

请帮忙!

标签: apache-sparkhiveapache-spark-sqlhiveql

解决方案


我从之前提出的不同问题中得到了答案。此查询工作正常

select * from your_data where key[0] between 202006 and 202009 and NOT (key[0]="202009" and key[1]="5" );


推荐阅读