首页 > 解决方案 > How to process larger files from JSON To CSV using Spring batch

问题描述

I am trying to implement a batch job for the following use-case.
(New to spring batch)

Use-case

From one source system every day I will get 200+ compressed(.gz) files. Each(.gz) file will gives a 1GB of file on unzip.
Which means 200GB of files in my input directory. Here the content type is JSON.

Sample Format of JSON File

{"name":"abc1","age":20}
{"name":"abc2","age":20}
{"name":"abc3","age":20}
.....

I need to process these files from JSON TO CSV to output directory. And the these csv generation should be similar like size based rolling in Log4J. After writing I need to remove the each file from input directory.

Question 1
Does the spring batch can handle this huge data? Because for single day I am getting nearly 200GB?
Question 2
I am thinking Spring batch can handle.So Implemented a code with partitioner using spring batch . But while reading I am seeing some dirty lines with out any end of line.

Faulty lines structure

  {"name":"abc1","age":20,....}
{"name":"abc2","age":20......}
{"name":"abc3","age":20
{"name":"abc1","age":20,....}
{"name":"abc1","age":20,....}
.....

For this I have written a skip policy but its not working as expected. Its skipping all line from the error line on-wards instead one line. How to skip only that error line? I am sharing my sample snippet below please give some suggestions or corrections on my code and to above questions and issues.

JobConfig.java

@Bean  
public Job myJob() throws Exception {  
return joubBuilderFactory.get(COnstants.JOB.JOB_NAME)
      .incrementer(new RunIdIncrementer())
     .listener(jobCompleteListener())
     .start(masterStep())
    .build();
//master
@Bean
public Step masterStep() throws Exception{
 return stepBuilderFactory.get("step")
.listener(new UnzipListener())
.partitioner(slaveStep())
.partitioner("P",partitioner())
.gridSize(10).
taskExecutor(executor())
.build();
}
//slaveStep
@Bean
public Step slaveStep() throws Exception{
 return stepBuilderFactory.get("slavestep")
.reader(reader(null))
.writer(customWriter)
.faultTolerant()
.skipPolicy(fileVerificationSkipper())
.build();
}
@Bean 
public SkipPolicy fileVerificatoinSkipper(){
return new FileVerficationSkipper();
}
@Bean
@StepScop
public Partitioner partitioner() throws Exception{
MutliResourcePartitioner part = new MultiResourcePartitioner();
PathMatching ResourcePatternResolver resolver = new PathMatchingResourcePatternResolver();
Resource[] res = resolver.getResource("...path of files...");
part.setResoruces(res);
part.partition(20);
return part;
}  

Skip Policy Code

public class LineVerificationSkipper implements SkipPolicy {
     
 
    @Override
    public boolean shouldSkip(Throwable exception, int skipCount) throws SkipLimitExceededException {
        if (exception instanceof FileNotFoundException) {
            return false;
        } else if (exception instanceof FlatFileParseException && skipCount <= 5) {
            FlatFileParseException ffpe = (FlatFileParseException) exception;
            StringBuilder errorMessage = new StringBuilder();
            errorMessage.append("An error occured while processing the " + ffpe.getLineNumber()
                    + " line of the file. Below was the faulty " + "input.\n");
            errorMessage.append(ffpe.getInput() + "\n");
            System.err.println(errorMessage.toString());
            return true;
        } else {
            return false;
        }
    }

Question 3

How to delete the input source files are processing each file?. Because I am not getting any info like file path or name in ItemWriter.?

标签: javajsonspring-bootcsvspring-batch

解决方案


推荐阅读