python - Tesseract,openCV,python:如何获取句子或同一行文本的边界框?
问题描述
我想对图像进行一些文本识别。我可以识别文本和相应的边界框,但只能逐字识别,我想在同一行文本上做同样的事情。在下面的代码中,我注意到当我显示边界框坐标时,当单词在同一行时,b['top'] 的值是相似的。我不知道我是否可以使用它,但我希望每行文本和相关句子都有一个边界框。
在我制作的代码下方:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import cv2
import pytesseract
from pytesseract import Output
pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files\\Tesseract-OCR\\tesseract.exe'
img = cv2.imread('./images/page_2.jpg') # load img
img = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY) #transform colored img to grayscale
plt.imshow(img)
boxes = pytesseract.image_to_data(img, output_type=Output.DICT) #transform image to dict
boxes = pd.DataFrame(boxes) #dict to dataframe
boxes['text'].replace('', np.nan, inplace=True) #replace empty values by NaN
boxes= boxes.dropna(subset = ['text']) #delete rows with NaN
print(boxes)
for index, b in boxes.iterrows():
(x,y,w,h) = b['left'],b['top'],b['width'],b['height']
print((x,y,w,h), b['text'])
cv2.rectangle(img,(x,y),(w+x,h+y), (0,0,255),1)
cv2.imshow('result',img)
cv2.waitKey(0)
“盒子” dict 的输出:
level page_num block_num par_num line_num word_num left top \
4 5 1 1 1 1 1 32 24
5 5 1 1 1 1 2 100 24
6 5 1 1 1 1 3 191 28
7 5 1 1 1 1 4 227 28
8 5 1 1 1 1 5 257 24
.. ... ... ... ... ... ... ... ...
154 5 1 1 11 1 7 261 457
155 5 1 1 11 1 8 320 461
156 5 1 1 11 1 9 351 457
157 5 1 1 11 1 10 376 457
158 5 1 1 11 1 11 468 457
width height conf text
4 60 17 93.283920 Maitre
5 82 19 93.204414 corbeau,
6 29 13 96.932060 sur
7 22 12 96.932060 un
8 50 17 93.306122 arbre
.. ... ... ... ...
154 51 21 79.999794 qu'on
155 23 13 90.411606 ne
156 18 21 21.623993 I'y
157 85 21 90.583260 prendrait
158 44 21 96.933327 plus.
(x,y,w,h) 和 b['text'] 的输出(带有文本的边界框):
(32, 24, 60, 17) Maitre
(100, 24, 82, 19) corbeau,
(191, 28, 29, 13) sur
(227, 28, 22, 12) un
(257, 24, 50, 17) arbre
(315, 24, 70, 21) perché,
(79, 49, 58, 17) Tenait
(144, 53, 23, 13) en
(174, 53, 34, 13) son
(216, 50, 33, 16) bec
(257, 53, 22, 13) un
(287, 49, 84, 22) fromage.
(32, 75, 60, 17) Maitre
(100, 75, 61, 17) renard
(169, 79, 31, 17) par
(206, 75, 64, 17) I'odeur
(277, 75, 68, 17) alléché
(353, 88, 3, 6) ,
(81, 101, 27, 16) Lui
(115, 101, 28, 16) tint
(151, 100, 11, 17) 4
(169, 104, 34, 17) peu
(211, 100, 42, 21) prés
(260, 104, 21, 13) ce
(289, 101, 76, 20) langage
(374, 105, 3, 12) :
(81, 126, 31, 16) «Et
(119, 126, 72, 21) bonjour
(199, 126, 88, 17) Monsieur
(294, 126, 22, 16) du
(324, 125, 87, 18) Corbeau.
(31, 151, 40, 17) Que
(78, 155, 46, 13) vous
(131, 151, 40, 17) 6tes
(177, 151, 32, 21) joli!
(217, 155, 35, 17) que
(260, 155, 44, 13) vous
(312, 155, 29, 13) me
(348, 151, 80, 17) semblez
(436, 151, 52, 17) beau!
(81, 176, 47, 18) Sans
(136, 177, 63, 19) mentir,
(207, 177, 15, 17) si
(229, 178, 48, 16) votre
(284, 181, 72, 17) ramage
(81, 202, 25, 17) Se
(114, 204, 79, 19) rapporte
(200, 202, 11, 17) a
(218, 204, 48, 15) votre
(273, 203, 87, 20) plumage,
(31, 228, 48, 17) Vous
(86, 227, 40, 18) étes
(134, 228, 15, 16) le
(157, 227, 63, 21) phénix
(227, 228, 34, 17) des
(269, 227, 51, 18) hétes
(327, 228, 23, 16) de
(358, 232, 33, 13) ces
(398, 228, 49, 17) bois»
(31, 253, 53, 17) Aces
(92, 255, 45, 15) mots
(145, 253, 15, 17) le
(167, 253, 78, 17) corbeau
(253, 257, 22, 13) ne
(283, 257, 22, 13) se
(312, 255, 40, 15) sent
(360, 257, 33, 17) pas
(400, 253, 23, 17) de
(429, 253, 40, 21) joie;
(81, 279, 19, 16) Et
(107, 283, 43, 16) pour
(157, 280, 74, 16) montrer
(238, 283, 22, 13) sa
(267, 279, 45, 16) belle
(319, 279, 43, 19) voix,
(33, 304, 8, 16) ll
(49, 308, 53, 13) ouvre
(110, 308, 22, 13) un
(140, 304, 47, 21) large
(195, 304, 33, 17) bec
(236, 304, 54, 17) laisse
(297, 305, 67, 16) tomber
(371, 308, 22, 13) sa
(400, 304, 53, 21) proie.
(32, 330, 23, 17) Le
(63, 330, 60, 16) renard
(131, 330, 38, 17) s'en
(177, 330, 48, 17) saisit
(232, 331, 17, 15) et
(256, 330, 28, 16) dit:
(291, 330, 49, 16) "Mon
(348, 330, 35, 16) bon
(391, 330, 92, 19) Monsieur,
(103, 355, 92, 21) Apprenez
(202, 359, 36, 17) que
(245, 356, 35, 16) tout
(287, 355, 67, 17) flatteur
(31, 381, 25, 16) Vit
(63, 385, 34, 12) aux
(104, 381, 71, 20) dépens
(181, 381, 24, 16) de
(212, 381, 43, 16) celui
(262, 381, 28, 20) qui
(298, 380, 79, 17) l'écoute:
(32, 406, 50, 17) Cette
(90, 406, 50, 21) lecon
(148, 407, 40, 16) vaut
(195, 406, 40, 17) bien
(243, 410, 22, 13) un
(273, 406, 79, 21) fromage
(359, 410, 45, 13) sans
(411, 406, 67, 17) doute."
(81, 432, 22, 16) Le
(110, 432, 77, 16) corbeau
(195, 432, 76, 16) honteux
(279, 433, 17, 15) et
(303, 432, 63, 16) confus
(31, 457, 42, 17) Jura
(81, 457, 44, 17) mais
(133, 461, 22, 13) un
(163, 461, 34, 17) peu
(205, 457, 36, 17) tard
(250, 470, 3, 6) ,
(261, 457, 51, 21) qu'on
(320, 461, 23, 13) ne
(351, 457, 18, 21) I'y
(376, 457, 85, 21) prendrait
(468, 457, 44, 21) plus.
图像结果:
解决方案
我注意到当我显示我的边界框坐标时,当单词在同一行时,b['top'] 的值是相似的。我不知道我是否可以使用它,但我希望每行文本和相关句子都有一个边界框。
您可以完全使用它。这通过聚合垂直重叠的框来生成线条:
def lineup(boxes):
linebox = None
for _, box in boxes.iterrows():
if linebox is None: linebox = box # first line begins
elif box.top <= linebox.top+linebox.height: # box in same line
linebox.top = min(linebox.top, box.top)
linebox.width = box.left+box.width-linebox.left
linebox.heigth = max(linebox.top+linebox.height, box.top+box.height)-linebox.top
linebox.text += ' '+box.text
else: # box in new line
yield linebox
linebox = box # new line begins
yield linebox # return last line
lineboxes = pd.DataFrame.from_records(lineup(boxes))
推荐阅读
- amazon-web-services - WAF 使用包含空格和单词 ON 的 POST 数据阻止 http 请求
- sharepoint-online - SharePoint API CAML 视图:即使有行限制,当我在 CAML 中包含查询时,我也会收到列表阈值错误
- java - 即使没有与PDFBox中的布局相关的属性(文档目录结构中的/ A),也获取标签的相关BBox?
- reactjs - 反应 onClick 事件在组件中不起作用
- angular - 找不到在 Angular 8 组件中返回对象数组的方法
- powershell - 用于检索第二个模式值的 Powershell 命令
- ios - 有时会出现错误,否则工作正常,如何确保消除错误?
- java - 访问由已实现接口引用的枚举的注释
- swiftui - SwiftUI 获取某些视图的 EnvironmentValues,仅给出对视图的引用
- scala - 仅为类标记参数实现函子映射