java - How to process larger files from JSON To CSV using Spring batch
问题描述
I am trying to implement a batch job for the following use-case.
(New to spring batch)
Use-case
From one source system every day I will get 200+ compressed(.gz) files. Each(.gz) file will gives a 1GB of file on unzip.
Which means 200GB of files in my input directory. Here the content type is JSON.
Sample Format of JSON File
{"name":"abc1","age":20}
{"name":"abc2","age":20}
{"name":"abc3","age":20}
.....
I need to process these files from JSON TO CSV to output directory. And the these csv generation should be similar like size based rolling in Log4J. After writing I need to remove the each file from input directory.
Question 1
Does the spring batch can handle this huge data? Because for single day I am getting nearly 200GB?
Question 2
I am thinking Spring batch can handle.So Implemented a code with partitioner using spring batch . But while reading I am seeing some dirty lines with out any end of line.
Faulty lines structure
{"name":"abc1","age":20,....}
{"name":"abc2","age":20......}
{"name":"abc3","age":20
{"name":"abc1","age":20,....}
{"name":"abc1","age":20,....}
.....
For this I have written a skip policy but its not working as expected. Its skipping all line from the error line on-wards instead one line. How to skip only that error line? I am sharing my sample snippet below please give some suggestions or corrections on my code and to above questions and issues.
JobConfig.java
@Bean
public Job myJob() throws Exception {
return joubBuilderFactory.get(COnstants.JOB.JOB_NAME)
.incrementer(new RunIdIncrementer())
.listener(jobCompleteListener())
.start(masterStep())
.build();
//master
@Bean
public Step masterStep() throws Exception{
return stepBuilderFactory.get("step")
.listener(new UnzipListener())
.partitioner(slaveStep())
.partitioner("P",partitioner())
.gridSize(10).
taskExecutor(executor())
.build();
}
//slaveStep
@Bean
public Step slaveStep() throws Exception{
return stepBuilderFactory.get("slavestep")
.reader(reader(null))
.writer(customWriter)
.faultTolerant()
.skipPolicy(fileVerificationSkipper())
.build();
}
@Bean
public SkipPolicy fileVerificatoinSkipper(){
return new FileVerficationSkipper();
}
@Bean
@StepScop
public Partitioner partitioner() throws Exception{
MutliResourcePartitioner part = new MultiResourcePartitioner();
PathMatching ResourcePatternResolver resolver = new PathMatchingResourcePatternResolver();
Resource[] res = resolver.getResource("...path of files...");
part.setResoruces(res);
part.partition(20);
return part;
}
Skip Policy Code
public class LineVerificationSkipper implements SkipPolicy {
@Override
public boolean shouldSkip(Throwable exception, int skipCount) throws SkipLimitExceededException {
if (exception instanceof FileNotFoundException) {
return false;
} else if (exception instanceof FlatFileParseException && skipCount <= 5) {
FlatFileParseException ffpe = (FlatFileParseException) exception;
StringBuilder errorMessage = new StringBuilder();
errorMessage.append("An error occured while processing the " + ffpe.getLineNumber()
+ " line of the file. Below was the faulty " + "input.\n");
errorMessage.append(ffpe.getInput() + "\n");
System.err.println(errorMessage.toString());
return true;
} else {
return false;
}
}
Question 3
How to delete the input source files are processing each file?. Because I am not getting any info like file path or name in ItemWriter.?
解决方案
推荐阅读
- asp.net - VS2019运行asp.net项目的sql连接问题
- angular - 如何使用 Ag-Grid 在导出的 excel 单元格中插入链接?
- javascript - Redux-toolkit 与 Typescript 一起使用,没有状态突变
- javascript - 按对象值排序对象数组不起作用
- apache-nifi - 用于循环属性的 Nifi 表达式语言
- windows - 为具有特定基地址的 Windows 构建 DLL
- python - 为什么这个 justify 功能不能正常工作?(证明右而不是下)
- java - Java如何将值放入xml标签?
- html - 尽管在 CSS 中将其设置为 100% 宽度,但标题扩展得太远
- google-sheets - 从另一个工作表中获取 COUNT(数组公式)