首页 > 解决方案 > 提高 Python Tesseract OCR 的准确性

问题描述

我在 Python 中的一个简单 django 应用程序中使用pytesseractopenCV从图像文件中提取孟加拉语文本。我有一个表单,可让您上传图像,然后单击提交按钮将其发送到 jQuery 中的 ajax 调用中的服务器端,以从图像中提取文本以达到 OCR(光学字符识别)的目的。

模板部分:

 <div style="text-align: center;">
 <div id="result" class="text-center"></div>
    <form enctype="multipart/form-data" id="ocrForm" action="{% url 'process_image' %}" method="post"> <!-- Do not forget to add: enctype="multipart/form-data" -->
        {% csrf_token %}
        {{ form }}
        <button type="submit" class="btn btn-success">OCRzed</button>
    </form>

    <br><br><hr>
    <div id="content" style="width: 50%; margin: 0 auto;">
        
    </div>
</div>


<script type="text/javascript">




 $(document).ready(function(){ 
        function submitFile(){
            var fd = new FormData();
            fd.append('file', getFile())
            $("#result").html('<span class="wait">Please wait....</span>');

            $('#content').html('');
            $.ajax({
                url: "{% url 'process_image' %}",
                type: "POST",
                data: fd,
                processData: false,
                contentType: false,
                success: function(data){
                    // console.log(data.content);

            $("#result").html('');

                    if(data.content){
                        $('#content').html(
                            "<p>" + data.content + "</p>"
                        )
                    }  
                }
            })
        }

        function getFile(){
            var fp = $("#file_id")
            var item = fp[0].files
            return item[0]
        }

        // Submit the file for OCRization
        $("#ocrForm").on('submit', function(event){
            event.preventDefault();
            submitFile()
        })
    });






</script>

urls.py 文件有:

from django.urls import path, re_path
from .views import *

urlpatterns = [
 path('process_image', OcrView.process_image, name='process_image') ,
]

视图部分:

from django.contrib.auth.models import User
from django.shortcuts  import render, redirect, get_object_or_404
from .forms import NewTopicForm
from .models import Board, Topic, Post
from django.shortcuts import render
from django.http import HttpResponse
from django.http import Http404
    
from django.http import JsonResponse
from django.views.generic import FormView
    
from django.views.decorators.csrf import csrf_exempt
import json
import cv2
import numpy as np
    
import pytesseract    # ======= > Add
try:
     from PIL import Image
except:
        import Image

def ocr(request):
    return render(request, 'ocr.html')
    #    {'board': board,'form':form})    

# get grayscale image
def get_grayscale(image):
         return cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# noise removal
def remove_noise(image):
         return cv2.medianBlur(image,5)
 
#thresholding
def thresholding(image):
         return cv2.threshold(image, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]

#dilation
def dilate(image):
         kernel = np.ones((5,5),np.uint8)
         return cv2.dilate(image, kernel, iterations = 1)
    
#erosion
def erode(image):
       kernel = np.ones((5,5),np.uint8)
       return cv2.erode(image, kernel, iterations = 1)

#opening - erosion followed by dilation
def opening(image):
        kernel = np.ones((5,5),np.uint8)
        return cv2.morphologyEx(image, cv2.MORPH_OPEN, kernel)

#canny edge detection
def canny(image):
        return cv2.Canny(image, 100, 200)

#skew correction
def deskew(image):
       coords = np.column_stack(np.where(image > 0))
       angle = cv2.minAreaRect(coords)[-1]
       if angle < -45:
         angle = -(90 + angle)
       else:
         angle = -angle
       (h, w) = image.shape[:2]
       center = (w // 2, h // 2)
       M = cv2.getRotationMatrix2D(center, angle, 1.0)
       rotated = cv2.warpAffine(image, M, (w, h), flags=cv2.INTER_CUBIC, borderMode=cv2.BORDER_REPLICATE)
       return rotated

#template matching
def match_template(image, template):
       return cv2.matchTemplate(image, template, cv2.TM_CCOEFF_NORMED)
 
class OcrView(FormView):
    form_class = UploadForm
    template_name = 'ocr.html'
    success_url = '/'

    
    @csrf_exempt
    def process_image(request):
        if request.method == 'POST':
          response_data = {}
          upload = request.FILES['file']
        
        filestr = request.FILES['file'].read()
        #convert string data to numpy array
        npimg = np.fromstring(filestr, np.uint8)
        image = cv2.imdecode(npimg, cv2.IMREAD_UNCHANGED)

        # image=Image.open(upload)
        gray = get_grayscale(image)
        thresh = thresholding(gray)
        opening1 = opening(gray)
        canny1 = canny(gray)
       
        pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
        # content = pytesseract.image_to_string(Image.open(upload), lang = 'ben')

        # content = pytesseract.image_to_string( image, lang = 'ben')

        content = pytesseract.image_to_string( image, lang = 'eng+ben')

        #   data_ben = process_image("test_ben.png", "ben")
        response_data['content'] = content

        return JsonResponse(response_data)

我在下面附上了一个示例图像,当我将其作为输入文件提供时,我从那里得到的提取文本没有达到任何令人满意的准确度。输入图像为:

输入文件

我附上了提取文本的屏幕截图,下面用红色下划线了错误的单词。请注意,此处不保留空格和缩进。提取文本的屏幕截图是:

在此处输入图像描述

在上面的代码片段中,我使用以下代码行完成了图像处理:

gray = get_grayscale(image)
thresh = thresholding(gray)
opening1 = opening(gray)
canny1 = canny(gray)

之后,我在以下行中为 tesserect 提供了处理后的图像:

content = pytesseract.image_to_string( image, lang = 'eng+ben')

但我的困惑是,我在处理之前或之后都没有保存图像。因此,当我使用上述行时,我不确定是否将已处理或未处理的图像提供给 tesserect 引擎。

Q1)我需要在处理后保存图像,然后将其提供给 tesserect 引擎吗?如果是,该怎么做?

Q2)我应该采取哪些其他步骤来提高准确性?

NB:即使你不熟悉孟加拉语,我认为这不会有任何问题,你可以看一下红色下划线的单词并进行比较。

编辑:

TL;DR: 您可以只查看 view.pyurls.py文件中的代码并排除模板代码以便于理解。

标签: djangoopencvocrtesseractpython-tesseract

解决方案


Q1) 无需保存图像。图像存储在您的变量图像中

Q2)您实际上并没有对应用到的图像后处理功能(即变量canny1 )进行 OCR 。下面的代码将依次对图像执行处理步骤,然后将 OCR 应用于存储在canny1中的后处理图像。

gray = get_grayscale(image)
thresh = thresholding(gray)
opening1 = opening(thresh )
canny1 = canny(opening1 )

content = pytesseract.image_to_string( canny1 , lang = 'eng+ben')

推荐阅读