首页 > 解决方案 > 配置单元创建表的多个转义字符

问题描述

我正在尝试将带有管道分隔符的 csv 加载到配置单元外部表中。数据值包含单引号、双引号、括号等。使用 Open CSV 版本 2.3

测试文件.csv

id|name|phone
1|Rahul|123
2|Kumar's|456
3|Neetu"s|789
4|Ravi [Roma]|234

表创建为 -

drop table test_schema.hive_test;
CREATE EXTERNAL TABLE test_schema.hive_test (id string, name string, phone string) 
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES
(
'separatorChar' = '|',
'quoteChar' = '[\'\"]',
'escapeChar' = '\\'
)
LOCATION '/staging/test/hive'
tblproperties ("skip.header.line.count"="1");

输出 -

+-------------+---------------+----------------+
| hive_test.id|hive_test.name |hive_test.phone |
+-------------+---------------+----------------+
| 1           | Rahul         |123             |
| 2           | Kumar's       |456             |
| 3           | Neetu"s       |789             |
| 4           | NULL          |234             |
+---------------+------------------------------+

预期产出 -

+-------------+---------------+----------------+
| hive_test.id|hive_test.name |hive_test.phone |
+-------------+---------------+----------------+
| 1           | Rahul         |123             |
| 2           | Kumar's       |456             |
| 3           | Neetu"s       |789             |
| 4           | Ravi [Roma]   |234             |
+---------------+------------------------------+

问题是我们在数据中有多个转义序列/引号字符,因此我们需要包含所有这些字符。

标签: hadoophiveopencsvhive-serdehiveddl

解决方案


使用 LazySimpleSerDe(存储为文本文件):

CREATE EXTERNAL TABLE test_schema.hive_test (id string, name string, phone string) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' 
ESCAPED BY '\\' 
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/staging/test/hive_test'
tblproperties ("skip.header.line.count"="1");

推荐阅读