首页 > 解决方案 > 使用特殊转义字符导入数据

问题描述

我正在尝试使用复制命令将数据导入雪花。我的文件格式定义如下:

CREATE FILE FORMAT mydb.schema1.myFileFormat
TYPE = CSV
COMPRESSION = 'AUTO' 
FIELD_DELIMITER = ',' 
RECORD_DELIMITER = '\n' 
SKIP_HEADER = 0 
FIELD_OPTIONALLY_ENCLOSED_BY = '\042' 
TRIM_SPACE = FALSE 
ERROR_ON_COLUMN_COUNT_MISMATCH = FALSE 
ESCAPE = '\241' 
ESCAPE_UNENCLOSED_FIELD = NONE
DATE_FORMAT = 'AUTO'
TIMESTAMP_FORMAT = 'AUTO' 
NULL_IF = ('\\N') 
COMMENT = '¡ used as escape character';

文件格式没有什么特别之处,只是它使用 ¡ 作为转义字符。

使用此文件格式导入数据时,Snowflake 似乎无法识别转义字符,并抛出错误消息“找到字符 'XYZ' 而不是字段分隔符 ','”。

我尝试使用 1 行创建一个文件,如下所示:

"ABC123","584382","2","01","01/22/2019","02/08/2019","02/08/2019","04/03/2019","04/03/2019","TEST","Unknown","Unknown","01-884400","Unknown","DACRON CONNECTIONS 15¡"1/2 DIA. X 11¡" LONG FOR EXHAUST DAMPER","","0.0","0.0","0.0","0.0","192.0","USD","2.0","2.0","0","0","96.00000","1","","","","","07882-0047","ASDF","ASDF","02/27/2019","04/06/2021","01/01/1970","0"

此文件在第 1 行 char 167 上失败,该字符位于第一个转义字符之后(以下文本中的 1 之前:CONNECTIONS 15¡" 1 /2)

知道为什么会这样吗?

这是我正在运行的复制代码

copy into mydb.schema1.mytable from @mydb.schema1.mystage/file-path/2021-05-26/test.txt
file_format = mydb.schema1.myFileFormat
validation_mode = 'return_all_errors';

标签: snowflake-cloud-data-platform

解决方案


简短答案 看起来像 Snowflake 只允许将单字节字符用作文件格式的转义字符。您用作转义字符的字符使用两个字节,因此文件格式不允许作为转义字符。

但是,您可以将多字节字符用于字段和行分隔符,因此不确定为什么 Snowflake 也不允许它作为转义字符。

更长的答案 您尝试用作转义字符 (¡) 的字符是两个字节长,十六进制值为\xC2\xA1. 这是不允许的,您可以通过以下错误看到:

CREATE OR REPLACE FILE FORMAT myFileFormat
    TYPE = CSV
        COMPRESSION = 'AUTO'
        FIELD_DELIMITER = ','
        RECORD_DELIMITER = '\n'
        SKIP_HEADER = 0
        FIELD_OPTIONALLY_ENCLOSED_BY = '\x22' -- Double quotes (")
        ERROR_ON_COLUMN_COUNT_MISMATCH = FALSE
        ESCAPE = '\xC2\xA1' -- Inverted exclamation point (¡)
        DATE_FORMAT = 'AUTO'
        TIMESTAMP_FORMAT = 'AUTO'
        NULL_IF = ('\\N')

参数“ESCAPE”的无效值 ['\xC2\xA1']

另一方面,如果我使用我可能使用的最后一个单字节字符(并且是可见的),波浪号(~),十六进制值\x7E(你认为应该是\xFF,但 utf-8 使用 7 位在它进入 2 个字节之前。长话短说。)然后它工作正常。我用文件和复制命令对此进行了测试,它可以正常工作。

CREATE OR REPLACE FILE FORMAT myFileFormat
    TYPE = CSV
        COMPRESSION = 'AUTO'
        FIELD_DELIMITER = ','
        RECORD_DELIMITER = '\n'
        SKIP_HEADER = 0
        FIELD_OPTIONALLY_ENCLOSED_BY = '\x22' -- Double quotes (")
        ERROR_ON_COLUMN_COUNT_MISMATCH = FALSE
        ESCAPE = '\x7E' -- Tilde (~)
        DATE_FORMAT = 'AUTO'
        TIMESTAMP_FORMAT = 'AUTO'
        NULL_IF = ('\\N')

[2021-05-26 23:49:21] 在 149 毫秒内完成


推荐阅读