首页 > 解决方案 > MeanShift 或带宽估计器花费太长时间来聚类图像点

问题描述

我不确定是带宽估计器还是 Meanshift 需要很长时间,我想知道如何减少时间,也许我需要修改图像或聚类算法的参数。

这是图像: 在此处输入图像描述

这是我的代码:

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from sklearn.cluster import KMeans, MeanShift, DBSCAN, AgglomerativeClustering # KMeans, MeanShift, DBSCAN y AgglomerativeHierarchicalClustering
from sklearn.mixture import GaussianMixture # GaussianMixture
from PIL import Image, ImageOps
import matplotlib.pyplot as plt

image = Image.open("/kaggle/input/kid-draw/drawings.jpg")
gray = ImageOps.grayscale(image)

data = np.asarray(gray)
points = []
maxX = len(data)
maxY = len(data[0])
for i in range(maxX):
    for j in range(maxY):
        if data[i][j] < 125:
            points.append((-i, j))

points = np.array(points)
plt.scatter(points[:, 1], points[:, 0])
plt.show()
print(points)

kernel = sklearn.cluster.estimate_bandwidth(points)
ms = MeanShift(bandwith=kernel)
clusters = ms.fit_predict(points)

标签: pythonpandasscikit-learnpython-imaging-library

解决方案


原始图像的大小相当大。我会考虑先调整它的大小:

openCV 调整大小

import cv2
 
img = cv2.imread("/kaggle/input/kid-draw/drawings.jpg", cv2.IMREAD_UNCHANGED)
resized = cv2.resize(img, (80,80), interpolation = cv2.INTER_AREA)

生成的图像也将是一个 np.array。


推荐阅读