首页 > 解决方案 > 如何根据 Python 中的平均度数过滤 scipy 晶石矩阵的节点?

问题描述

我正在使用 Flickr 社交网络数据并尝试通过根据度数过滤数据集(节点数)来减小大小。我只想使用度数最高的前 50 个节点。创建具有最高程度的前 50 个节点的列表后,我无法应用于原始图。

数据集来源: http: //networkrepository.com/soc-flickr.php

我当前的代码:

import networkx as nx
from networkx import from_scipy_sparse_matrix as sm
from scipy import io

flickr = io.mmread(os.path.join('soc-flickr','soc-flickr.mtx'))
Gflickr = sm(flicker)

print (nx.info(Gflickr))
# Out: Type: Graph
# Out: Number of nodes: 513969
# Out: Number of edges: 3190452
# Out: Average degree:  12.4150

for n, d in Gflickr.degree():
    print('%s %d' % (n, d))

top_50 = sorted(Gflickr.degree, key=lambda x: x[1], reverse=True)
top_50 = top_50[:50]
top_50
Out: [(9205, 4369),
 (3843, 4196),
 (1552, 4011),
 (75, 4004),
 (1641, 3810),
 (5814, 3779),....

# took the first item from the lists, which should be 
the index of the node in the original graph

node_index_list = [item[0] for item in top_50]
S = nx.to_scipy_sparse_matrix(Gflickr, nodelist= [9205,3843,1552,75,6517,11816,....,42004,109870,70193,30540])

#create a graph from the sparse matrix
top_50_graph =sm(S)
print(nx.info(top_50_graph))


# here is the issue...
Out: Type: Graph
Number of nodes: 50
Number of edges: 0
Average degree:   0.0000

标签: pythonsparse-matrixnetworkxsocial-networkingadjacency-matrix

解决方案


您可以使用以下代码确定最高度节点:

import networkx as nx

graph = nx.karate_club_graph()

number_of_nodes = 10
top_nodes = list(sorted(list(graph.nodes), key=lambda x: graph.degree(x), reverse=True))[:number_of_nodes]
print(top_nodes)
# [33, 0, 32, 2, 1, 3, 31, 8, 13, 23]
print([graph.degree(node) for node in top_nodes])
# [17, 16, 12, 10, 9, 6, 6, 5, 5, 5]

我不确定,如果 50 个最高度节点的图(尤其是在之前已经过滤了一些边(?)之后),是否保留了原始图的那么多属性。


推荐阅读