java - (Java + AWS Textract):如何为文本配置 FORMS 而不是 LINES
问题描述
我有一个工作应用程序并设法提取一些文档。但是,我得到了“原始文本”并给出了置信度分数。我想使用 Java 在 FORMS 部分中检索它。我相信代码的第 3 部分会有代码更改,但我该如何更改它。谢谢你。
https://docs.aws.amazon.com/textract/latest/dg/how-it-works-kvp.html
package aws.cloud.work;
import java.io.IOException;
import com.amazonaws.auth.AWSStaticCredentialsProvider;
import com.amazonaws.auth.BasicAWSCredentials;
import com.amazonaws.regions.Regions;
import com.amazonaws.services.textract.AmazonTextract;
import com.amazonaws.services.textract.AmazonTextractClientBuilder;
import com.amazonaws.services.textract.model.DetectDocumentTextRequest;
import com.amazonaws.services.textract.model.DetectDocumentTextResult;
import com.amazonaws.services.textract.model.Document;
import com.amazonaws.services.textract.model.S3Object;
public class TextractOriginalMaster2 {
static AmazonTextractClientBuilder clientBuilder = AmazonTextractClientBuilder.standard().withRegion(Regions.AP_SOUTHEAST_1);
public static void main(String[] args) throws IOException {
//Set AWS Credentials to use Textract
clientBuilder.setCredentials(new AWSStaticCredentialsProvider(new
BasicAWSCredentials("ACCESSKEY", "SECRETKEY")));
//**Currently getting document from local path. Need to update to S3 Path
String document = "ID.jpg";
String bucket = "BUCKETNAME";
//Calling AWS Textract Client
AmazonTextract client = clientBuilder.build();
DetectDocumentTextRequest request = new DetectDocumentTextRequest()
.withDocument(new Document()
.withS3Object(new S3Object()
.withName(document)
.withBucket(bucket)));
DetectDocumentTextResult result = client.detectDocumentText(request);
System.out.println(result);
result.getBlocks().forEach(block -> {
if (block.getBlockType().equals("LINE"))
System.out.println(block.getText() + " | " + block.getConfidence());
});
}
}
输出
FIN: A1234567A Card No: 111-111111|98.165695
Name: NAME NAME NAME|93.997894
Designation: DESIGNATION|83.17153
Mission: EMBASSY OF AAAA|98.26149
Date of Issue: 01-01-9999|89.6941
Date of Expiry: 01-01-9999|99.14406
解决方案
推荐阅读
- django - 生成的每个复选框所需的 Django form.ModelMultipleChoiceField 标签
- oracle-ebs - ORA-00980: 同义词翻译不再有效 00980. 00000 - “同义词翻译不再有效”
- javascript - 在 CSV 文件的特定列中搜索包含带有 node.js 的字符串的前 n 个匹配项
- c++ - 64 位 Vivek 的虚拟相机
- openedge - 从另一个文件调用过程
- javascript - 从 JavaScript(浏览器)启动 Node.js 文件/模块
- sql-server-2008-r2 - SQL Server 作业间歇性失败,并在生产中出现错误“无法连接到 SQL Server '(本地)'步骤失败”
- android - Android 居中一个 TableLayout
- php - 在一组上传的文件上应用 php 代码
- php - 无法从 GODADDY 上托管的站点获取计算机的本地 IP 地址