首页 > 解决方案 > 如何在雅典娜中使用带有sqlite的tsv文件对列进行行格式化

问题描述

因此,我想将这些从 tsv 文件添加到 athena 中的表中,除了最后一列类型之外,我可以执行此操作。我的意思是我可以添加它,但我希望它像 ["Comedy", "Mystery"] 但它以 [Comedy,Mystery] 的形式出现,这使得无法以任何方式访问它们

tconst      genres 

tt0081313   Action

tt0081315   Comedy,Mystery

tt0081349   Comedy,Crime

这就是我所做的:

CREATE EXTERNAL TABLE `title_basics`(
  `tconst` string, 
  `genres` Array<string>)

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (

  'field.delim' = '\t'  # This is for separating them by tab which is right but how can I also
                        # add the genres the way I want them to the table

)

STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'mylocation'
TBLPROPERTIES (
  'has_encrypted_data'='false', 
  'transient_lastDdlTime'='-----')

标签: sqlamazon-web-servicessqliteamazon-athenahive-serde

解决方案


The column genre is being interpreted as a string. There are two possible solutions:

select split(genre,',') -- this will give you an array of genres

Or directly create the column as an array by adding a , as a collection separator.


推荐阅读