首页 > 解决方案 > (Java + AWS Textract):如何为文本配置 FORMS 而不是 LINES

问题描述

我有一个工作应用程序并设法提取一些文档。但是,我得到了“原始文本”并给出了置信度分数。我想使用 Java 在 FORMS 部分中检索它。我相信代码的第 3 部分会有代码更改,但我该如何更改它。谢谢你。

https://docs.aws.amazon.com/textract/latest/dg/how-it-works-kvp.html

文本格式输出

package aws.cloud.work;

import java.io.IOException;
import com.amazonaws.auth.AWSStaticCredentialsProvider;
import com.amazonaws.auth.BasicAWSCredentials;
import com.amazonaws.regions.Regions;
import com.amazonaws.services.textract.AmazonTextract;
import com.amazonaws.services.textract.AmazonTextractClientBuilder;
import com.amazonaws.services.textract.model.DetectDocumentTextRequest;
import com.amazonaws.services.textract.model.DetectDocumentTextResult;
import com.amazonaws.services.textract.model.Document;
import com.amazonaws.services.textract.model.S3Object;

public class TextractOriginalMaster2 {

    static AmazonTextractClientBuilder clientBuilder = AmazonTextractClientBuilder.standard().withRegion(Regions.AP_SOUTHEAST_1);

    public static void main(String[] args) throws IOException {
       
        //Set AWS Credentials to use Textract
        clientBuilder.setCredentials(new AWSStaticCredentialsProvider(new
                BasicAWSCredentials("ACCESSKEY", "SECRETKEY")));
        
        //**Currently getting document from local path. Need to update to S3 Path
        String document = "ID.jpg";
        String bucket = "BUCKETNAME";
        
        //Calling AWS Textract Client
        AmazonTextract client = clientBuilder.build();
        DetectDocumentTextRequest request = new DetectDocumentTextRequest()
                .withDocument(new Document()
                        .withS3Object(new S3Object()
                                .withName(document)
                                .withBucket(bucket)));
        DetectDocumentTextResult result = client.detectDocumentText(request);
        System.out.println(result);
        result.getBlocks().forEach(block -> {
            if (block.getBlockType().equals("LINE"))
                System.out.println(block.getText() + " | " + block.getConfidence());
        });
        
        
    }
}

输出

FIN: A1234567A Card No: 111-111111|98.165695
Name: NAME NAME NAME|93.997894
Designation: DESIGNATION|83.17153
Mission: EMBASSY OF AAAA|98.26149
Date of Issue: 01-01-9999|89.6941
Date of Expiry: 01-01-9999|99.14406

控制台输出

标签: javaamazon-web-servicesspring-bootamazon-textract

解决方案


推荐阅读