首页 > 解决方案 > 尝试从 S3 存储桶中读取 json 文件但无法执行

问题描述

这是我尝试测试的代码。但它说 AWS 凭证问题。当我提供访问键值时,不知道这是怎么回事。

from pyspark.sql import SQLContext
sqlContext = SQLContext(spark)
aws_access_key_id = "******"
aws_secret_access_key= "*****"
spark._jsc.hadoopConfiguration().set("fs.s3a.awsAccessKeyId", aws_access_key_id)
spark._jsc.hadoopConfiguration().set("fs.s3a.awsSecretAccessKey", aws_secret_access_key)
object_list = [k for k in bucket.objects.all()]
key_list = [k.key for k in bucket.objects.all()]

paths = ['s3a://'+o.bucket_name+'/'+ o.key for o in object_list ]
dataframes = [sqlContext.read.json(path) for path in paths]

df = dataframes[0]
for idx, frame in enumerate(dataframes):
  df = df.unionAll(frame)

我收到以下错误

Py4JJavaError: An error occurred while calling o1225.json.
: com.amazonaws.AmazonClientException: Unable to load AWS credentials from any provider in the chain
    at com.amazonaws.auth.AWSCredentialsProviderChain.getCredentials(AWSCredentialsProviderChain.java:117)
    at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3521)
    at com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1031)
    at com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:994)
    at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:297)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)

标签: amazon-web-servicesapache-sparkamazon-s3pyspark

解决方案


您使用了错误的身份验证属性。

它应该fs.s3a.access.key用于您的访问密钥和 fs.s3a.secret.key您的密钥。


推荐阅读