首页 > 解决方案 > 无法在python中以纱线集群模式读取jceks文件

问题描述

我正在使用 jceks 文件来解密我的密码,并且无法在 yarn 集群模式下读取加密的密码

我尝试了不同的方法,例如包含

spark-submit --deploy-mode cluster 
--file /localpath/credentials.jceks#credentials.jceks
--conf spark.hadoop.hadoop.security.credential.provider.path=jceks://file////localpath/credentials.jceks test.py
spark1 = SparkSession.builder.appName("xyz").master("yarn").enableHiveSupport().config("hive.exec.dynamic.partition", "true").config("hive.exec.dynamic.partition.mode", "nonstrict").getOrCreate()
x = spark1.sparkContext._jsc.hadoopConfiguration()
x.set("hadoop.security.credential.provider.path", "jceks://file///credentials.jceks")
a = x.getPassword("<password alias>")
passw = ""
for i in range(a.__len__()):
   passw = passw + str(a.__getitem__(i))

我收到以下错误:

属性错误:“NoneType”对象没有属性“ len

当我打印一个时,它没有

标签: apache-sparkhadooppysparkhadoop-yarnjceks

解决方案


FWIW,如果您尝试将您的 jceks 文件放入 hdfs,纱线工作人员将能够在集群模式下运行时找到它,至少它对我有用。希望对你有效。

hadoop fs -put ~/.jceks /user/<uid>/.jceks
spark1 = SparkSession.builder.appName("xyz").master("yarn").enableHiveSupport().config("hive.exec.dynamic.partition", "true").config("hive.exec.dynamic.partition.mode", "nonstrict").getOrCreate()
x = spark1.sparkContext._jsc.hadoopConfiguration()
jceks_hdfs_path = "jceks://hdfs@<host>/user/<uid>/.jceks"
x.set("hadoop.security.credential.provider.path", jceks_hdfs_path)
a = x.getPassword("<password alias>")
passw = ""
for i in range(a.__len__()):
   passw = passw + str(a.__getitem__(i))

这样你就不需要在运行 spark-submit 时在参数中指定 --files 和 --conf 了。希望能帮助到你。


推荐阅读