python-3.x - 使用 Python 通过 Kerberos 身份验证连接到 HDFS
问题描述
我正在尝试连接到受 Kerberos 身份验证保护的 HDFS。我有以下详细信息,但不知道如何进行。
User
Password
Realm
HttpFs Url
我尝试了以下代码,但出现身份验证错误:
from hdfs.ext.kerberos import KerberosClient
import requests
import logging
logging.basicConfig(level=logging.DEBUG)
session = requests.Session()
session.verify = False
client = KerberosClient(url='http://x.x.x.x:abcd', session=session,
mutual_auth='REQUIRED',principal='abcdef@LMNOPQ')
print(client.list('/'))
错误
INFO:hdfs.client:Instantiated
<KerberosClient(url=http://x.x.x.x:abcd)>.
INFO:hdfs.client:Listing '/'.
DEBUG:hdfs.client:Resolved path '/' to '/'.
DEBUG:hdfs.client:Resolved path '/' to '/'.
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1):
DEBUG:urllib3.connectionpool:http://x.x.x.x:abcd "GET /webhdfs/v1/?
op=LISTSTATUS HTTP/1.1" 401 997
DEBUG:requests_kerberos.kerberos_:handle_401(): Handling: 401
ERROR:requests_kerberos.kerberos_:generate_request_header(): authGSSClientInit() failed:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests_kerberos/kerberos_.py", line 213, in generate_request_header
gssflags=gssflags, principal=self.principal)
kerberos.GSSError: ((' No credentials were supplied, or the credentials were unavailable or inaccessible.', 458752), ('unknown mech-code 0 for mech unknown', 0))
ERROR:requests_kerberos.kerberos_:((' No credentials were supplied, or the credentials were unavailable or inaccessible.', 458752), ('unknown mech-code 0 for mech unknown', 0))
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests_kerberos/kerberos_.py", line 213, in generate_request_header
gssflags=gssflags, principal=self.principal)
kerberos.GSSError: ((' No credentials were supplied, or the credentials were unavailable or inaccessible.', 458752), ('unknown mech-code 0 for mech unknown', 0))
DEBUG:requests_kerberos.kerberos_:handle_401(): returning <Response [401]>
DEBUG:requests_kerberos.kerberos_:handle_response(): returning <Response [401]>
我也有密码,但不知道在哪里提供。
解决方案
假设您有原则:hdfs/localhost@HADOOP.COM 并且您的密钥表文件是:/var/run/cloudera-scm-agent/process/39-hdfs-NAMENODE/hdfs.keytab 如果您想阅读 hdfs csv文件已在:/hadoop_test_data/filecount.csv,然后使用以下代码,您将获得带有 filecount.csv 内容的 pandas 数据框
更多在这里,我使用了python版本:3.7.6
import io
from csv import reader
from krbcontext import krbcontext
import subprocess
import pandas as pd
try:
with krbcontext(using_keytab=True,
principal='hdfs/localhost@HADOOP.COM',
keytab_file='/var/run/cloudera-scm-agent/process/39-hdfs-NAMENODE/hdfs.keytab') as krb:
print(krb)
print('kerberos authentication successful')
output = subprocess.Popen(["hadoop", "fs", "-cat", "/hadoop_test_data/filecount.csv"], stdout=subprocess.PIPE)
stdout,stderr = output.communicate()
data = str(stdout,'utf-8').split('\r\n')
df = pd.DataFrame( list(reader(data[1:])),columns=data[0].split(','))
print(df.shape)
print(df)
except Exception as e:
print("Kerberos authentication unsuccessful")
print("Detailed error is : "+e)
如果您想了解更多信息,请告诉我。
推荐阅读
- php - 序列化字符串包含对无法实例化的类的引用
- python - Python - 如何使 pm2 与多处理一起工作?
- angular - 如何从注入的模拟服务中模拟 observable 和 geter
- r - 在 ggplot 中使用 aes_string
- python - Python3 - 通过比较其错误消息来处理异常是否很好?(请查看我的代码)
- javascript - 更改画布中每个矩形的颜色
- ios - JSON解析-没有URL会话任务恢复的替代方法是什么,因为它不是主线程的一部分?
- javascript - 如何在没有 iframe 的情况下在离子 div 中加载外部 url
- sql - SSIS 包不更新列
- python - 无法从 PLC 地址读取浮点值