python - 使用 Python OpenCV 在两行之间查找文本
问题描述
我想使用 Python (cv2) 识别和突出显示/裁剪两行之间的文本。
第一行是顶部的波浪线,第二行是页面中的某处。此行可以出现在页面上的任何高度,范围从 1 行之后到最后一行之前。
一个例子,
我相信我需要为此使用HoughLinesP()
适当的参数。我尝试了一些涉及erode
++组合的dilate
示例HoughLinesP
。
例如
img = cv2.imread(image)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
kernel_size = 5
blur_gray = cv2.GaussianBlur(gray, (kernel_size, kernel_size), 0)
# erode / dilate
erode_kernel_param = (5, 200) # (5, 50)
dilate_kernel_param = (5, 5) # (5, 75)
img_erode = cv2.erode(blur_gray, np.ones(erode_kernel_param))
img_dilate = cv2.dilate(img_erode, np.ones(dilate_kernel_param))
# %% Second, process edge detection use Canny.
low_threshold = 50
high_threshold = 150
edges = cv2.Canny(img_dilate, low_threshold, high_threshold)
# %% Then, use HoughLinesP to get the lines.
# Adjust the parameters for better performance.
rho = 1 # distance resolution in pixels of the Hough grid
theta = np.pi / 180 # angular resolution in radians of the Hough grid
threshold = 15 # min number of votes (intersections in Hough grid cell)
min_line_length = 600 # min number of pixels making up a line
max_line_gap = 20 # max gap in pixels between connectable line segments
line_image = np.copy(img) * 0 # creating a blank to draw lines on
# %% Run Hough on edge detected image
# Output "lines" is an array containing endpoints of detected line segments
lines = cv2.HoughLinesP(edges, rho, theta, threshold, np.array([]),
min_line_length, max_line_gap)
if lines is not None:
for line in lines:
for x1, y1, x2, y2 in line:
cv2.line(line_image, (x1, y1), (x2, y2), (255, 0, 0), 5)
# %% Draw the lines on the image
lines_edges = cv2.addWeighted(img, 0.8, line_image, 1, 0)
然而,在许多情况下,这些行并没有得到正确识别。一些错误的例子是,
- 识别出的行太多(文本中的行也是如此)
- 未完全识别的行
- 线根本没有被识别
我在正确的轨道上吗?为此,我是否只需要点击正确的参数组合?还是有一种更简单的方法/技巧可以让我可靠地裁剪这两行之间的文本?
如果它是相关的,我需要为大约 450 页执行此操作。这是本书的链接,以防有人想检查更多页面示例。 https://archive.org/details/in.ernet.dli.2015.553713/page/n13/mode/2up
谢谢你。
解决方案
我对 Ari(谢谢)的答案做了一些小的修改,并且为了我自己的缘故使代码更易于理解,这是我的代码。
核心思想是,
- 查找轮廓及其边界矩形。
- 两个“最宽”的轮廓将代表两条线。
- 此后,取顶部矩形的下边和底部矩形的上边来界定我们感兴趣的区域(文本)。
for image in images:
base_img = cv2.imread(image)
height, width, channels = base_img.shape
img = cv2.cvtColor(base_img, cv2.COLOR_BGR2GRAY)
ret, img = cv2.threshold(img, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
img = cv2.bitwise_not(img)
contours, hierarchy = cv2.findContours(
img, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE
)
# Get rectangle bounding contour
rects = [cv2.boundingRect(contour) for contour in contours]
# Rectangle is (x, y, w, h)
# Top-Left point of the image is (0, 0), rightwards X, downwards Y
# Sort the contours bigger width first
rects.sort(key=lambda r: r[2], reverse=True)
# Get the 2 "widest" rectangles
line_rects = rects[:2]
line_rects.sort(key=lambda r: r[1])
# If at least two rectangles (contours) were found
if len(line_rects) >= 2:
top_x, top_y, top_w, top_h = line_rects[0]
bot_x, bot_y, bot_w, bot_h = line_rects[1]
# Cropping the img
# Crop between bottom y of the upper rectangle (i.e. top_y + top_h)
# and the top y of lower rectangle (i.e. bot_y)
crop_img = base_img[top_y+top_h:bot_y]
# Highlight the area by drawing the rectangle
# For full width, 0 and width can be used, while
# For exact width (erroneous) top_x and bot_x + bot_w can be used
rect_img = cv2.rectangle(
base_img,
pt1=(0, top_y + top_h),
pt2=(width, bot_y),
color=(0, 255, 0),
thickness=2
)
cv2.imwrite(image.replace('.jpg', '.rect.jpg'), rect_img)
cv2.imwrite(image.replace('.jpg', '.crop.jpg'), crop_img)
else:
print(f"Insufficient contours in {image}")
解决方案
你可以找到轮廓,然后取宽度最大的两个。
base_img = cv2.imread('a.png')
img = cv2.cvtColor(base_img, cv2.COLOR_BGR2GRAY)
ret, img = cv2.threshold(img, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
img = cv2.bitwise_not(img)
cnts, hierarchy = cv2.findContours(img, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
# sort the cnts bigger width first
cnts.sort(key=lambda c: cv2.boundingRect(c)[2], reverse=True)
# get the 2 big lines
lines = [cv2.boundingRect(cnts[0]), cv2.boundingRect(cnts[1])]
# higher line first
lines.sort(key=lambda c: c[1])
# croping the img
crop_img = base_img[lines[0][1]:lines[1][1]]
推荐阅读
- java - 如何提取您的 android 应用程序的上下文信息?稍后将用于分析或广告目的
- sql-server - VBA在用户表单中结合两种非ascii语言
- flutter - 颤振:小部件重建时请求被垃圾邮件
- javascript - 如何将切换更改为图标单击(用于切换到暗模式)
- react-native - 是否可以使用 BLE 在本机反应中检测/扫描三星 Galaxy Watch?
- html - 在 CSS 中从右侧滑入 DIV?酷蓝示例
- powershell - 计算机名称和主要用户配置文件的 PowerShell 列表
- mysql - MYSQL 唯一列
- r - R - 在我的数据框上使用 ungroup() 会破坏我的功能吗?
- javascript - 二叉搜索树中的删除如何在这里工作?