首页 > 解决方案 > 连接段和标签段中最近的点

问题描述

我使用 Open CV 和 skimage 对数据表进行文档分析。 在此处输入图像描述 我正在尝试单独分割阴影区域。 在此处输入图像描述

我目前能够将零件和编号分割为不同的集群。 在此处输入图像描述

使用felzenszwalb()来自 skimage 我分割部分:

import matplotlib.pyplot as plt
import numpy as np     
from skimage.segmentation import felzenszwalb
from skimage.io import imread

img = imread('test.jpg')

segments_fz = felzenszwalb(img, scale=100, sigma=0.2, min_size=50)

print("Felzenszwalb number of segments {}".format(len(np.unique(segments_fz))))

plt.imshow(segments_fz)
plt.tight_layout()
plt.show()

但无法连接它们。任何有条不紊地连接并用零件和零件编号标记相应部分的想法都会有很大帮助。提前感谢您的宝贵时间——如果我遗漏了任何内容,过分强调或过分强调某一点,请在评论中告诉我。

标签: pythonalgorithmopencvimage-processingimage-segmentation

解决方案


预赛

一些初步代码:

%matplotlib inline
%load_ext Cython
import numpy as np
import cv2
from matplotlib import pyplot as plt
import skimage as sk
import skimage.morphology as skm
import itertools

def ShowImage(title,img,ctype):
  plt.figure(figsize=(20, 20))
  if ctype=='bgr':
    b,g,r = cv2.split(img)       # get b,g,r
    rgb_img = cv2.merge([r,g,b])     # switch it to rgb
    plt.imshow(rgb_img)
  elif ctype=='hsv':
    rgb = cv2.cvtColor(img,cv2.COLOR_HSV2RGB)
    plt.imshow(rgb)
  elif ctype=='gray':
    plt.imshow(img,cmap='gray')
  elif ctype=='rgb':
    plt.imshow(img)
  else:
    raise Exception("Unknown colour type")
  plt.axis('off')
  plt.title(title)
  plt.show()

作为参考,这是您的原始图像:

#Read in image
img         = cv2.imread('part.jpg')
ShowImage('Original',img,'bgr')

原始图像

识别数字

为简化起见,我们希望将像素分类为打开或关闭。我们可以通过阈值来做到这一点。由于我们的图像包含两类清晰的像素(黑色和白色),我们可以使用Otsu 的方法。我们将反转配色方案,因为我们使用的库认为黑色像素无聊而白色像素有趣。

#Convert image to grayscale
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)

#Apply Otsu's method to eliminate pixels of intermediate colour
ret, thresh = cv2.threshold(gray,0,255,cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)

ShowImage('Applying Otsu',thresh,'gray')

#Verify that pixels are either black or white and nothing in between
np.unique(thresh)

大津变身

我们的策略是定位数字,然后沿着它们附近的线找到零件,然后给这些零件贴上标签。因为方便地,所有阿拉伯数字都是由连续的像素组成的,所以我们可以从找到连接的组件开始。

ret, components = cv2.connectedComponents(thresh)
#Each component is a different colour
ShowImage('Connected Components', components, 'rgb')

连接组件

然后我们可以通过过滤维度来过滤连接的组件以找到数字。请注意,这不是一个超级健壮的方法。更好的选择是使用字符识别,但这留给读者作为练习:-)

class Box:
    def __init__(self,x0,x1,y0,y1):
        self.x0, self.x1, self.y0, self.y1 = x0,x1,y0,y1
    def overlaps(self,box2,tol):
        if self.x0 is None or box2.x0 is None:
            return False
        return not (self.x1+tol<=box2.x0 or self.x0-tol>=box2.x1 or self.y1+tol<=box2.y0 or self.y0-tol>=box2.y1)
    def merge(self,box2):
        self.x0 = min(self.x0,box2.x0)
        self.x1 = max(self.x1,box2.x1)
        self.y0 = min(self.y0,box2.y0)
        self.y1 = max(self.y1,box2.y1)
        box2.x0 = None #Used to mark `box2` as being no longer valid. It can be removed later
    def dist(self,x,y):
        #Get center point
        ax = (self.x0+self.x1)/2
        ay = (self.y0+self.y1)/2
        #Get distance to center point
        return np.sqrt((ax-x)**2+(ay-y)**2)
    def good(self):
        return not (self.x0 is None)

def ExtractComponent(original_image, component_matrix, component_number):
    """Extracts a component from a ConnectedComponents matrix"""
    #Create a true-false matrix indicating if a pixel is part of a particular component
    is_component = component_matrix==component_number
    #Find the coordinates of those pixels
    coords = np.argwhere(is_component)

    # Bounding box of non-black pixels.
    y0, x0 = coords.min(axis=0)
    y1, x1 = coords.max(axis=0) + 1   # slices are exclusive at the top

    # Get the contents of the bounding box.
    return x0,x1,y0,y1,original_image[y0:y1, x0:x1]

numbers_img = thresh.copy() #This is used purely to show that we can identify numbers
numbers = []
for component in range(components.max()):
    tx0,tx1,ty0,ty1,this_component = ExtractComponent(thresh, components, component)
    #ShowImage('Component #{0}'.format(component), this_component, 'gray')
    cheight, cwidth = this_component.shape
    #print(cwidth,cheight) #Enable this to see dimensions
    #Identify numbers based on aspect ratio
    if (abs(cwidth-14)<3 or abs(cwidth-7)<3) and abs(cheight-24)<3:
        numbers_img[ty0:ty1,tx0:tx1] = 128
        numbers.append(Box(tx0,tx1,ty0,ty1))
ShowImage('Numbers', numbers_img, 'gray')

带分隔框的数字

我们现在通过稍微扩展它们的边界框并寻找重叠将数字连接成连续的块。

#This is kind of a silly way to do this, but it will work find for small quantities (hundreds)
merged=True                                       #If true, then a merge happened this round
while merged:                                     #Continue until there are no more mergers
    merged=False                                  #Reset merge indicator
    for a,b in itertools.combinations(numbers,2): #Consider all pairs of numbers
        if a.overlaps(b,10):                      #If this pair overlaps
            a.merge(b)                            #Merge it
            merged=True                           #Make a note that we've merged
numbers = [x for x in numbers if x.good()]        #Eliminate those boxes that were gobbled by the mergers

#This is used purely to show that we can identify numbers
numbers_img = thresh.copy() 
for n in numbers:
    numbers_img[n.y0:n.y1,n.x0:n.x1] = 128
    thresh[n.y0:n.y1,n.x0:n.x1] = 0 #Drop numbers from thresholded image
ShowImage('Numbers', numbers_img, 'gray')

连接的数字

好的,现在我们已经确定了数字!我们稍后将使用这些来识别零件。

识别箭头

接下来,我们要弄清楚数字所指的部分。为此,我们要检测线条。霍夫变换对此很有用。为了减少误报的数量,我们将数据骨架化,将其转换为最多一个像素宽的表示。

skel = sk.img_as_ubyte(skm.skeletonize(thresh>0))
ShowImage('Skeleton', skel, 'gray')

骨骼

现在我们执行霍夫变换。我们正在寻找一种能够识别从数字到零件的所有行的方法。要做到这一点,可能需要对参数进行一些调整。

lines = cv2.HoughLinesP(
    skel,
    1,           #Resolution of r in pixels
    np.pi / 180, #Resolution of theta in radians
    30,          #Minimum number of intersections to detect a line
    None,
    80,          #Min line length
    10           #Max line gap
)
lines = [x[0] for x in lines]

line_img = thresh.copy()
line_img = cv2.cvtColor(line_img, cv2.COLOR_GRAY2BGR)
for l in lines:
    color = tuple(map(int, np.random.randint(low=0, high=255, size=3)))
    cv2.line(line_img, (l[0], l[1]), (l[2], l[3]), color, 3, cv2.LINE_AA)
ShowImage('Lines', line_img, 'bgr')

行识别

我们现在想要找到最接近每个数字的一​​行或多行并只保留这些。我们基本上过滤掉了所有不是箭头的线。为此,我们将每条线的端点与每个数字框的中心点进行比较。

  comp_labels = np.zeros(img.shape[0:2], dtype=np.uint8)

for n_idx,n in enumerate(numbers):
    distvals = []
    for i,l in enumerate(lines):
        #Distances from each point of line to midpoint of rectangle
        dists    = [n.dist(l[0],l[1]),n.dist(l[2],l[3])] 
        #Minimum distance and the end point (0 or 1) of the line associated with that point
        #Tuples of (Line Number, Line Point, Dist to Line Point) are produced
        distvals.append( (i,np.argmin(dists),np.min(dists)) )
    #Sort by distance between the number box and the line
    distvals = sorted(distvals, key=lambda x: x[2])
    #Include nearby lines, not just the closest one. This accounts for forking.
    distvals = [x for x in distvals if x[2]<1.5*distvals[0][2]]

    #Draw a white rectangle where the number box was
    cv2.rectangle(comp_labels, (n.x0,n.y0), (n.x1,n.y1), 1, cv2.FILLED)

    #Draw white lines where the arrows are
    for dv in distvals:
        l = lines[dv[0]]
        lp = (l[0],l[1]) if dv[1]==0 else (l[2],l[3])
        cv2.line(comp_labels, (l[0], l[1]), (l[2], l[3]), 1, 3, cv2.LINE_AA)
        cv2.line(comp_labels, (lp[0], lp[1]), ((n.x0+n.x1)//2, (n.y0+n.y1)//2), 1, 3, cv2.LINE_AA)
ShowImage('Lines', comp_labels, 'gray')

箭

寻找零件

这部分很难!我们现在要分割图像中的部分。如果有某种方法可以断开连接子部分的线路,这将很容易。不幸的是,连接子部分的线与构成这些部分的许多线的宽度相同。

为了解决这个问题,我们可以使用很多逻辑。这将是痛苦且容易出错的。

或者,我们可以假设您有一位专家。这位专家的唯一工作是切断连接子部件的线。这对他们来说应该既简单又快速。给所有东西贴标签对人类来说是缓慢而悲伤的,但对计算机来说却是很快的。分离事物对人类来说很容易,但对计算机来说却很难。所以我们让双方都做他们最擅长的事情。

在这种情况下,您可能可以在几分钟内培训某人完成这项工作,因此真正的“专家”并不是真正需要的。只是一个有能力的人。

如果你追求这个,你需要在循环工具中编写专家。为此,请保存骨架图像,让您的专家修改它们,然后重新读取骨架化图像。像这样。

#Save the image, or display it on a GUI
#cv2.imwrite("/z/skel.png", skel);
#EXPERT DOES THEIR THING HERE
#Read the expert-mediated image back in
skelhuman = cv2.imread('/z/skel.png')
#Convert back to the form we need
skelhuman = cv2.cvtColor(skelhuman,cv2.COLOR_BGR2GRAY)
ret, skelhuman = cv2.threshold(skelhuman,0,255,cv2.THRESH_OTSU)
ShowImage('SkelHuman', skelhuman, 'gray')

人体改造的骨架

现在我们已经分开了各个部分,我们将尽可能多地消除箭头。我们已经在上面提取了这些,因此如果需要,我们可以稍后将它们添加回来。

为了消除箭头,我们将找到所有终止于另一条线以外的位置的线。也就是说,我们将定位只有一个相邻像素的像素。然后我们将消除该像素并查看其邻居。反复执行此操作会消除箭头。由于我不知道它的另一个术语,我将其称为Fuse Transform。由于这需要操作单个像素,这在 Python 中会非常慢,因此我们将在 Cython 中编写转换。

%%cython -a --cplus
import cython

from libcpp.queue cimport queue
import numpy as np
cimport numpy as np

@cython.boundscheck(False)
@cython.wraparound(False)
@cython.nonecheck(False)
@cython.cdivision(True) 
cpdef void FuseTransform(unsigned char [:, :] image):
    # set the variable extension types
    cdef int c, x, y, nx, ny, width, height, neighbours
    cdef queue[int] q

    # grab the image dimensions
    height = image.shape[0]
    width  = image.shape[1]

    cdef int dx[8]
    cdef int dy[8]

    #Offsets to neighbouring cells
    dx[:] = [-1,-1,0,1,1,1,0,-1]
    dy[:] = [0,-1,-1,-1,0,1,1,1]

    #Find seed cells: those with only one neighbour
    for y in range(1, height-1):
        for x in range(1, width-1):
            if image[y,x]==0: #Seed cells cannot be blank cells
                continue
            neighbours = 0
            for n in range(0,8):   #Looks at all neighbours
                nx = x+dx[n]
                ny = y+dy[n]
                if image[ny,nx]>0: #This neighbour has a value
                    neighbours += 1
            if neighbours==1:      #Was there only one neighbour?
                q.push(y*width+x)  #If so, this is a seed cell

    #Starting with the seed cells, gobble up the lines
    while not q.empty():
        c = q.front()
        q.pop()
        y = c//width         #Convert flat index into 2D x-y index
        x = c%width
        image[y,x] = 0       #Gobble up this part of the fuse
        neighbour  = -1      #No neighbours yet
        for n in range(0,8): #Look at all neighbours
            nx = x+dx[n]     #Find coordinates of neighbour cells
            ny = y+dy[n]
            #If the neighbour would be off the side of the matrix, ignore it
            if nx<0 or ny<0 or nx==width or ny==height:
                continue
            if image[ny,nx]>0:      #Is the neighbouring cell active?
                if neighbour!=-1:   #If we've already found an active neighbour
                    neighbour=-1    #Then pretend we found no neighbours
                    break           #And stop looking. This is the end of the fuse.
                else:               #Otherwise, make a note of the neighbour's index.
                    neighbour = ny*width+nx
        if neighbour!=-1:           #If there was only one neighbour
            q.push(neighbour)       #Continue burning the fuse

回到标准 Python:

#Apply the Fuse Transform
skh_dilated=skelhuman.copy()
FuseTransform(skh_dilated)
ShowImage('Fuse Transform', skh_dilated, 'gray')

保险丝变形

现在我们已经消除了连接部件的所有箭头和线条,我们将剩余的像素扩大了很多

kernel = np.ones((3,3),np.uint8)
dilated  = cv2.dilate(skh_dilated, kernel, iterations=6)
ShowImage('Dilation', dilated, 'gray')

扩张部分

把它们放在一起

并覆盖我们之前分割出来的标签和箭头......

comp_labels_dilated  = cv2.dilate(comp_labels, kernel, iterations=5)
labels_combined = np.uint8(np.logical_or(comp_labels_dilated,dilated))
ShowImage('Comp Labels', labels_combined, 'gray')

组合箭头和零件

最后,我们采用合并后的数字框、组件箭头和零件,并使用来自Color Brewer的漂亮颜色为它们中的每一个上色。然后我们将其覆盖在原始图像上以获得所需的突出显示。

ret, labels = cv2.connectedComponents(labels_combined)
colormask = np.zeros(img.shape, dtype=np.uint8)
#Colors from Color Brewer
colors = [(228,26,28),(55,126,184),(77,175,74),(152,78,163),(255,127,0),(255,255,51),(166,86,40),(247,129,191),(153,153,153)]
for l in range(labels.max()):
    if l==0: #Background component
        colormask[labels==0] = (255,255,255)
    else:
        colormask[labels==l] = colors[l]
ShowImage('Comp Labels', colormask, 'bgr')
blended = cv2.addWeighted(img,0.7,colormask,0.3,0)
ShowImage('Blended', blended, 'bgr')

彩色零件

最终图像

最终图像

因此,回顾一下,我们确定了数字、箭头和部分。在某些情况下,我们能够自动将它们分开。在其他情况下,我们在循环中使用了专家。在我们必须单独操作像素的地方,我们使用 Cython 来提高速度。

当然,这种事情的危险在于其他一些图像会打破我在这里所做的(许多)假设。但是,当您尝试使用单个图像来呈现问题时,这是您冒的风险。


推荐阅读