首页 > 解决方案 > AWS Athena CTAS 查询失败,建议清空空桶

问题描述

我正在运行“CREATE TABLE AS SELECT (CTAS) 查询”(https://docs.aws.amazon.com/athena/latest/ug/ctas.html),查询复制在底部。我收到以下错误消息:

HIVE_PATH_ALREADY_EXISTS: Target directory for table 'default.openaq_processed' already exists:
 s3://<processed-data-bucketname>/. You may need to manually clean the data at location 
's3://<athena-query-results-bucketname>/Unsaved/2021/04/29/tables/82025a35-8867-4865-8f42-f40adb6bee4c' 
before retrying. Athena will not delete data in your account.

This query ran against the "default" database, unless qualified by the query. Please post the
 error message on our forum or contact customer support with Query Id: 82025a35-8867-4865-8f42-f40adb6bee4c.

有关此错误的 AWS 知识中心页面 ( https://aws.amazon.com/premiumsupport/knowledge-center/athena-hive-path-already-exists/ ) 与上面的错误消息一样,表明修复是为了确保用于存储查询结果的位置必须为空。

但它已经是。实际上, 中没有tables/前缀/文件夹s3://<athena-query-results-bucketname>/Unsaved/2021/04/29/,并且s3://<processed-data-bucketname>/存储桶完全是空的。

我已经在 AWS 论坛上发布了这个问题,但没有收到任何回复。我怎样才能让这个 CTAS 查询成功?

更新

引发错误的查询:

CREATE TABLE openaq_processed
WITH (format='PARQUET', 
parquet_compression='SNAPPY', 
partitioned_by=array['country', 'parameter'], 
external_location = '<processed-data-bucketname>') 
AS
SELECT date_utc as date_utc_str,
date_local as date_local_str,
CAST(from_iso8601_timestamp(date_utc) as timestamp) as timestamp_utc,
CAST(from_iso8601_timestamp(date_local) as timestamp) as timestamp_local,
"location",  -- location is a reserved word for Athena, needs quotes
value,
unit,
city,
attribution,
averagingperiod,
coordinates."latitude" as latitude,
coordinates."longitude" as longitude,
sourcename,
sourcetype,
mobile,
country,
parameter
FROM openaq_pq2_tables

标签: sqlamazon-web-servicesamazon-s3amazon-athena

解决方案


因此,我寻求 AWS 开发人员支持并提出了这个问题。我得到的确实解决了错误的响应是在我的external_location存储桶中创建一个文件夹。不知道为什么这是必要的,但显然它是必要的。

所以,从外壳: $ aws s3 mb s3://<processed-data-bucketname>/processed_data/

mb上面代表“制作桶”)。

然后external_location = 's3://<processed-data-bucketname>'在上面的查询中更新为 external_location = 's3://<processed-data-bucketname>/processed_data/')


推荐阅读