首页 > 解决方案 > 亲和传播没有收敛,这个模型不会有任何聚类中心

问题描述

当我尝试使用亲和传播进行聚类时,会发生以下错误并且聚类数为 1。

"...\anaconda\lib\site-packages\sklearn\cluster\_affinity_propagation.py:246: ConvergenceWarning: Affinity propagation did not converge, this model will not have any cluster centers.
  warnings.warn("Affinity propagation did not converge, this model ""

下面是我试过的代码。

def build_feature_matrix(documents, feature_type='frequency',
                         ngram_range=(1, 1), min_df=0.0, max_df=1.0):

    feature_type = feature_type.lower().strip()  
    
    if feature_type == 'binary':
        vectorizer = CountVectorizer(binary=True, min_df=min_df,
                                     max_df=max_df, ngram_range=ngram_range)
    elif feature_type == 'frequency':
        vectorizer = CountVectorizer(binary=False, min_df=min_df,
                                     max_df=max_df, ngram_range=ngram_range)
    elif feature_type == 'tfidf':
        vectorizer = TfidfVectorizer(min_df=min_df, max_df=max_df, 
                                     ngram_range=ngram_range)
    else:
        raise Exception("Wrong feature type entered. Possible values: 'binary', 'frequency', 'tfidf'")

    feature_matrix = vectorizer.fit_transform(documents).astype(float)
    
    return vectorizer, feature_matrix

vectorizer, feature_matrix = build_feature_matrix(filtered_list_6,
                                                  feature_type='tfidf',
                                                  min_df=0.15, max_df=0.85,
                                                  ngram_range=(1, 2))

def affinity_propagation(feature_matrix):
    
    sim = feature_matrix * feature_matrix.T
    sim = sim.todense()
    ap = AffinityPropagation()
    ap.fit(sim)
    clusters = ap.labels_          
    return ap, clusters

ap_obj, clusters = affinity_propagation(feature_matrix=feature_matrix)
df[len(df.columns)] = clusters

c = Counter(clusters)   
print(c.items())

total_clusters = len(c)
print('Total Clusters:', total_clusters)

有人可以指出我在这里做错了什么吗?

提前致谢!

标签: pythontextnlpdata-science

解决方案


我可以改变阻尼值、max_iter 和偏好值来消除这个问题。最初,您可以从阻尼 = 0.9,max_iter = 1000 开始。

您可以根据需要更改首选项值,这将更改模型生成的集群数量


推荐阅读