首页 > 解决方案 > Spark,无法创建带有分区的表,但没有分区有效

问题描述

我的数据结构是这样的

/mnt/path/db/table/keya=01/keyb=123
/mnt/path/db/table/keya=01/keyb=124
/mnt/path/db/table/keya=02/keyb=123

表的创建成功

CREATE EXTERNAL TABLE `test_table_a`(
..irrelevant schema..
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\u0001'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  '/mnt/path/db/table/keya=0101/keyb=123'

然后

select count(*) from test_table_a;
//returns
1876 
//correct

我可以很好地查询数据,但是我想要一个分区表。

我试过这个

CREATE EXTERNAL TABLE `test_table_a`(
..irrelevant schema..
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\u0001'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
PARTITIONED BY (
  `keya` string,
  `keyb` string)
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  '/mnt/path/db/table'

和这个

CREATE EXTERNAL TABLE `test_table_a`(
..irrelevant schema..
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\u0001'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
PARTITIONED BY (
  `keya` string,
  `keyb` string)
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  '/mnt/path/db/table/keya=*/keyb=*'

但是,在这两个上我都得到了这个结果

select count(*) from test_table_a;
//returns
0 
show partitions test_table_a;
//returns
//nothing

标签: sqlapache-sparkhiveapache-spark-sqldatabricks

解决方案


当您在位置顶部创建外部分区表时(数据已经存在于该位置),因此在您的配置单元外壳中执行以下命令

hive> msck repair table <db.name>.<table_name>;

检查您是否能够看到test_table_a表中的分区信息和数据。

创建表语句:

CREATE EXTERNAL TABLE `test_table_a`(
..irrelevant schema..
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\u0001'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
PARTITIONED BY (
  `keya` string,
  `keyb` string)
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  '/mnt/path/db/table';

推荐阅读