首页 > 解决方案 > Tesseract,openCV,python:如何获取句子或同一行文本的边界框?

问题描述

我想对图像进行一些文本识别。我可以识别文本和相应的边界框,但只能逐字识别,我想在同一行文本上做同样的事情。在下面的代码中,我注意到当我显示边界框坐标时,当单词在同一行时,b['top'] 的值是相似的。我不知道我是否可以使用它,但我希望每行文本和相关句子都有一个边界框。

在我制作的代码下方:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import cv2 
import pytesseract
from pytesseract import Output

pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files\\Tesseract-OCR\\tesseract.exe'

img = cv2.imread('./images/page_2.jpg') # load img

img = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)  #transform colored img to grayscale

plt.imshow(img)

boxes = pytesseract.image_to_data(img, output_type=Output.DICT) #transform image to dict

boxes = pd.DataFrame(boxes) #dict to dataframe
boxes['text'].replace('', np.nan, inplace=True) #replace empty values by NaN
boxes= boxes.dropna(subset = ['text']) #delete rows with NaN 

print(boxes)

for index, b in boxes.iterrows():
    (x,y,w,h) = b['left'],b['top'],b['width'],b['height']
    print((x,y,w,h), b['text'])
    cv2.rectangle(img,(x,y),(w+x,h+y), (0,0,255),1)
    
cv2.imshow('result',img)
cv2.waitKey(0)

“盒子” dict 的输出:

     level  page_num  block_num  par_num  line_num  word_num  left  top  \
4        5         1          1        1         1         1    32   24   
5        5         1          1        1         1         2   100   24   
6        5         1          1        1         1         3   191   28   
7        5         1          1        1         1         4   227   28   
8        5         1          1        1         1         5   257   24   
..     ...       ...        ...      ...       ...       ...   ...  ...   
154      5         1          1       11         1         7   261  457   
155      5         1          1       11         1         8   320  461   
156      5         1          1       11         1         9   351  457   
157      5         1          1       11         1        10   376  457   
158      5         1          1       11         1        11   468  457   

     width  height       conf       text  
4       60      17  93.283920     Maitre  
5       82      19  93.204414   corbeau,  
6       29      13  96.932060        sur  
7       22      12  96.932060         un  
8       50      17  93.306122      arbre  
..     ...     ...        ...        ...  
154     51      21  79.999794      qu'on  
155     23      13  90.411606         ne  
156     18      21  21.623993        I'y  
157     85      21  90.583260  prendrait  
158     44      21  96.933327      plus.

(x,y,w,h) 和 b['text'] 的输出(带有文本的边界框):

(32, 24, 60, 17) Maitre
(100, 24, 82, 19) corbeau,
(191, 28, 29, 13) sur
(227, 28, 22, 12) un
(257, 24, 50, 17) arbre
(315, 24, 70, 21) perché,
(79, 49, 58, 17) Tenait
(144, 53, 23, 13) en
(174, 53, 34, 13) son
(216, 50, 33, 16) bec
(257, 53, 22, 13) un
(287, 49, 84, 22) fromage.
(32, 75, 60, 17) Maitre
(100, 75, 61, 17) renard
(169, 79, 31, 17) par
(206, 75, 64, 17) I'odeur
(277, 75, 68, 17) alléché
(353, 88, 3, 6) ,
(81, 101, 27, 16) Lui
(115, 101, 28, 16) tint
(151, 100, 11, 17) 4
(169, 104, 34, 17) peu
(211, 100, 42, 21) prés
(260, 104, 21, 13) ce
(289, 101, 76, 20) langage
(374, 105, 3, 12) :
(81, 126, 31, 16) «Et
(119, 126, 72, 21) bonjour
(199, 126, 88, 17) Monsieur
(294, 126, 22, 16) du
(324, 125, 87, 18) Corbeau.
(31, 151, 40, 17) Que
(78, 155, 46, 13) vous
(131, 151, 40, 17) 6tes
(177, 151, 32, 21) joli!
(217, 155, 35, 17) que
(260, 155, 44, 13) vous
(312, 155, 29, 13) me
(348, 151, 80, 17) semblez
(436, 151, 52, 17) beau!
(81, 176, 47, 18) Sans
(136, 177, 63, 19) mentir,
(207, 177, 15, 17) si
(229, 178, 48, 16) votre
(284, 181, 72, 17) ramage
(81, 202, 25, 17) Se
(114, 204, 79, 19) rapporte
(200, 202, 11, 17) a
(218, 204, 48, 15) votre
(273, 203, 87, 20) plumage,
(31, 228, 48, 17) Vous
(86, 227, 40, 18) étes
(134, 228, 15, 16) le
(157, 227, 63, 21) phénix
(227, 228, 34, 17) des
(269, 227, 51, 18) hétes
(327, 228, 23, 16) de
(358, 232, 33, 13) ces
(398, 228, 49, 17) bois»
(31, 253, 53, 17) Aces
(92, 255, 45, 15) mots
(145, 253, 15, 17) le
(167, 253, 78, 17) corbeau
(253, 257, 22, 13) ne
(283, 257, 22, 13) se
(312, 255, 40, 15) sent
(360, 257, 33, 17) pas
(400, 253, 23, 17) de
(429, 253, 40, 21) joie;
(81, 279, 19, 16) Et
(107, 283, 43, 16) pour
(157, 280, 74, 16) montrer
(238, 283, 22, 13) sa
(267, 279, 45, 16) belle
(319, 279, 43, 19) voix,
(33, 304, 8, 16) ll
(49, 308, 53, 13) ouvre
(110, 308, 22, 13) un
(140, 304, 47, 21) large
(195, 304, 33, 17) bec
(236, 304, 54, 17) laisse
(297, 305, 67, 16) tomber
(371, 308, 22, 13) sa
(400, 304, 53, 21) proie.
(32, 330, 23, 17) Le
(63, 330, 60, 16) renard
(131, 330, 38, 17) s'en
(177, 330, 48, 17) saisit
(232, 331, 17, 15) et
(256, 330, 28, 16) dit:
(291, 330, 49, 16) "Mon
(348, 330, 35, 16) bon
(391, 330, 92, 19) Monsieur,
(103, 355, 92, 21) Apprenez
(202, 359, 36, 17) que
(245, 356, 35, 16) tout
(287, 355, 67, 17) flatteur
(31, 381, 25, 16) Vit
(63, 385, 34, 12) aux
(104, 381, 71, 20) dépens
(181, 381, 24, 16) de
(212, 381, 43, 16) celui
(262, 381, 28, 20) qui
(298, 380, 79, 17) l'écoute:
(32, 406, 50, 17) Cette
(90, 406, 50, 21) lecon
(148, 407, 40, 16) vaut
(195, 406, 40, 17) bien
(243, 410, 22, 13) un
(273, 406, 79, 21) fromage
(359, 410, 45, 13) sans
(411, 406, 67, 17) doute."
(81, 432, 22, 16) Le
(110, 432, 77, 16) corbeau
(195, 432, 76, 16) honteux
(279, 433, 17, 15) et
(303, 432, 63, 16) confus
(31, 457, 42, 17) Jura
(81, 457, 44, 17) mais
(133, 461, 22, 13) un
(163, 461, 34, 17) peu
(205, 457, 36, 17) tard
(250, 470, 3, 6) ,
(261, 457, 51, 21) qu'on
(320, 461, 23, 13) ne
(351, 457, 18, 21) I'y
(376, 457, 85, 21) prendrait
(468, 457, 44, 21) plus.

图像结果:

结果

标签: pythonpandasopencvocrpython-tesseract

解决方案


我注意到当我显示我的边界框坐标时,当单词在同一行时,b['top'] 的值是相似的。我不知道我是否可以使用它,但我希望每行文本和相关句子都有一个边界框。

您可以完全使用它。这通过聚合垂直重叠的框来生成线条:

def lineup(boxes):
    linebox = None
    for _, box in boxes.iterrows():
        if linebox is None: linebox = box           # first line begins
        elif box.top <= linebox.top+linebox.height: # box in same line
            linebox.top = min(linebox.top, box.top)
            linebox.width = box.left+box.width-linebox.left
            linebox.heigth = max(linebox.top+linebox.height, box.top+box.height)-linebox.top
            linebox.text += ' '+box.text
        else:                                       # box in new line
            yield linebox
            linebox = box                           # new line begins
    yield linebox                                   # return last line

lineboxes = pd.DataFrame.from_records(lineup(boxes))

推荐阅读