apache-spark - EMR Hudi 无法创建 hive 连接 jdbc:hive2://localhost:10000/
问题描述
尝试在启用配置单元同步的 Jupyter 笔记本中保存 hudi 表。我正在使用 EMR: 5.28.0 并启用 AWS Glue 作为目录:
# Create a DataFrame
inputDF = spark.createDataFrame(
[
("100", "2015-01-01", "2015-01-01T13:51:39.340396Z"),
("101", "2015-01-01", "2015-01-01T12:14:58.597216Z"),
("102", "2015-01-01", "2015-01-01T13:51:40.417052Z"),
("103", "2015-01-01", "2015-01-01T13:51:40.519832Z"),
("104", "2015-01-02", "2015-01-01T12:15:00.512679Z"),
("105", "2015-01-02", "2015-01-01T13:51:42.248818Z"),
],
["id", "creation_date", "last_update_time"]
)
# Specify common DataSourceWriteOptions in the single hudiOptions variable
hudiOptions = {
'hoodie.table.name': 'my_hudi_table',
'hoodie.datasource.write.recordkey.field': 'id',
'hoodie.datasource.write.partitionpath.field': 'creation_date',
'hoodie.datasource.write.precombine.field': 'last_update_time',
'hoodie.datasource.hive_sync.enable': 'true',
'hoodie.datasource.hive_sync.table': 'my_hudi_table',
'hoodie.datasource.hive_sync.partition_fields': 'creation_date',
'hoodie.datasource.hive_sync.partition_extractor_class': 'org.apache.hudi.hive.MultiPartKeysValueExtractor'
}
# Write a DataFrame as a Hudi dataset
(inputDF.write
.format('org.apache.hudi')
.option('hoodie.datasource.write.operation', 'insert')
.options(**hudiOptions)
.mode('overwrite')
.save('s3://dytyniak-test-data/myhudidataset/'))
收到以下错误:
An error occurred while calling o309.save.
: org.apache.hudi.hive.HoodieHiveSyncException: Cannot create hive connection jdbc:hive2://localhost:10000/
解决方案
我假设您正在遵循 AWS 文档中的教程。我通过设置hive_sync.mode
为hms
in使用 Hudi 0.9.0 让它工作hudiOptions
(参见hudi docs):
hudiOptions = {
'hoodie.table.name': 'my_hudi_table',
'hoodie.datasource.write.recordkey.field': 'id',
'hoodie.datasource.write.partitionpath.field': 'creation_date',
'hoodie.datasource.write.precombine.field': 'last_update_time',
'hoodie.datasource.hive_sync.enable': 'true',
'hoodie.datasource.hive_sync.table': 'my_hudi_table',
'hoodie.datasource.hive_sync.partition_fields': 'creation_date',
'hoodie.datasource.hive_sync.partition_extractor_class':
'org.apache.hudi.hive.MultiPartKeysValueExtractor',
'hoodie.datasource.hive_sync.mode': 'hms'
}
推荐阅读
- javascript - Admin-on-rest 是否提供日期范围选择器
- sql-server - 按列选择每个组中不同的项目
- android - 从 Google Play 下载后首次启动应用程序崩溃 (ExceptionInInitializerError)
- spring-boot - 带有客户端证书的 RestTemplate
- c# - 当前上下文中不存在“InitializeComponent”
- angular - 角度地理定位 - 显示某个位置,而不是我当前的位置
- angular - Angular 2+ 在数组中的单个对象上异步执行 setTimeOut
- wpf - XAML:样式 ControlTemplate 故事板动画不会播放,但触发被击中
- apache - Yii2 + Nginx(代理)+ Apache(后台)
- ruby - 使用 Ruby HTTPClient 进行身份验证失败