java - Spring Batch - 从 AWS S3 读取文件以处理
问题描述
我正在编写一个需要从 AWS S3 存储桶读取文件的 Spring Batch 应用程序。
这是我的 AWS Config Java 类,
@Configuration
public class AWSConfig{
@Value("${cloud.aws.credentials.accessKey}")
private String accessKey;
@Value("${cloud.aws.credentials.secretKey}")
private String secretKey;
@Value("${cloud.aws.region}")
private String region;
@Bean
public BasicAWSCredentials basicAWSCredentials() {
return new BasicAWSCredentials(accessKey, secretKey);
}
@Bean
public AmazonS3Client amazonS3Client(AWSCredentials awsCredentials) {
AmazonS3Client amazonS3Client = (AmazonS3Client) AmazonS3ClientBuilder.standard()
.withCredentials(new AWSStaticCredentialsProvider(awsCredentials))
.withRegion(region)
.build();
return amazonS3Client;
}
}
这是我的 aws-context.xml(位于 resources/)文件,用于修改默认的 ResourceLoader,
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:aws-context="http://www.springframework.org/schema/cloud/aws/context"
xsi:schemaLocation="http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans.xsd
http://www.springframework.org/schema/cloud/aws/context
http://www.springframework.org/schema/cloud/aws/context/spring-cloud-aws-context.xsd">
<aws-context:context-resource-loader amazon-s3="amazonS3Client" />
</beans>
这是我的 SpringBatchConfig.java 类,
@Configuration
@EnableBatchProcessing
public class SpringBatchConfig {
@Autowired
private ResourceLoader resourceLoader;
@Bean
public Job job(JobBuilderFactory jobBuilderFactory,
StepBuilderFactory stepBuilderFactory,
ItemReader<User> itemReader,
ItemProcessor<User, User> itemProcessor,
ItemWriter<User> itemWriter
) {
Step step = stepBuilderFactory.get("ETL-file-load")
.<User, User>chunk(100)
.reader(itemReader)
.processor(itemProcessor)
.writer(itemWriter)
.build();
return jobBuilderFactory.get("ETL-Load")
.incrementer(new RunIdIncrementer())
.start(step)
.build();
}
@Bean
public FlatFileItemReader<User> itemReader() throws IOException {
FlatFileItemReader<User> flatFileItemReader = new FlatFileItemReader<>();
flatFileItemReader.setResource(resourceLoader.getResource("s3://" + "<bucket-name>" + "/" + "<key>"));
flatFileItemReader.setName("CSV-Reader");
flatFileItemReader.setLinesToSkip(1);
flatFileItemReader.setLineMapper(lineMapper());
return flatFileItemReader;
}
@Bean
public LineMapper<User> lineMapper() {
DefaultLineMapper<User> defaultLineMapper = new DefaultLineMapper<>();
DelimitedLineTokenizer lineTokenizer = new DelimitedLineTokenizer();
lineTokenizer.setDelimiter(",");
lineTokenizer.setStrict(false);
lineTokenizer.setNames(new String[]{"id", "name", "dept", "salary"});
BeanWrapperFieldSetMapper<User> fieldSetMapper = new BeanWrapperFieldSetMapper<>();
fieldSetMapper.setTargetType(User.class);
defaultLineMapper.setLineTokenizer(lineTokenizer);
defaultLineMapper.setFieldSetMapper(fieldSetMapper);
return defaultLineMapper;
}
}
我已按照此 StackOverflow 线程中@mtoutcalt 给出的答案进行配置, Spring Batch - 从 Aws S3 读取文件
还有这个文档:https ://cloud.spring.io/spring-cloud-aws/spring-cloud-aws.html#_resource_handling
我面临的问题,
1)在 SpringBatchConfig.java 中,当它尝试自动装配 ResourceLoader 时,它说(我正在使用 IntelliJIdea),
Could not autowire. There is more than one bean of 'ResourceLoader' type.
Beans:
(aws-config.xml) webApplicationContext (Spring Web)
2)当我运行批处理应用程序时,它说,
Caused by: java.lang.IllegalStateException: Input resource must exist (reader is in 'strict' mode): ServletContext resource [/s3://<bucket-name>/<key>]
有人可以帮忙解决这个问题吗?
问候
解决方案
推荐阅读
- python - 使用来自输入的查询进行过滤
- python - Str object is not callable TypeError for Connect 4 game using Python
- ffmpeg - FFMPEG 用 BGR 转换而不是 RGB 分成两帧 | 开放式维诺
- apache-kafka - Kafka Log Compaction 总是显示相同键的最后两条记录
- swift - 如何快速排列collectionView单元格,就像一堆卡片一样
- html - 网页抓取标题属性的内容
- omnet++ - 使用 TraCIcommandInterface 在 Veins 中添加函数
- linux - 我正在尝试安装 docker。每当我运行 sudo apt-get update 时,我都会收到此错误:
- reactjs - 在另一个组件中渲染一个组件
- parsing - 描述不是 LL(1) 的 LL(2) 语言的语法,其中没有规则可以产生 epsilon?