python - Python PageRank 实现页面排名向量对每个节点收敛到相同的值
问题描述
下面是一个 PageRank 的实现。
import numpy as np
from scipy.sparse import csc_matrix
import networkx as nx
def pageRank(G, s = .85, maxerr = 0.0001):
"""
Computes the pagerank for each of the n states
Parameters
----------
G: matrix representing state transitions
Gij is a binary value representing a transition from state i to j.
s: probability of following a transition. 1-s probability of teleporting
to another state.
maxerr: if the sum of pageranks between iterations is bellow this we will
have converged.
"""
n = G.shape[0]
# transform G into markov matrix A
A = csc_matrix(G,dtype=np.float)
rsums = np.array(A.sum(1))[:,0]
ri, ci = A.nonzero()
A.data /= rsums[ri]
# bool array of sink states
sink = rsums==0
# Compute pagerank r until we converge
ro, r = np.zeros(n), np.ones(n)
while np.sum(np.abs(r-ro)) > maxerr:
ro = r.copy()
# calculate each pagerank at a time
for i in range(0,n):
# inlinks of state i
Ai = np.array(A[:,i].todense())[:,0]
# account for sink states
Di = sink / float(n)
# account for teleportation to state i
Ei = np.ones(n) / float(n)
r[i] = ro.dot( Ai*s + Di*s + Ei*(1-s) )
# return normalized pagerank
return r/float(sum(r))
我正在尝试在这个数据集上运行这个算法。
该数据集是从密歇根大学数据集页面获取的海豚社交网络图。
我已检索数据并将其转换为邻接矩阵,然后将该算法应用于邻接矩阵
dol = nx.read_gml('dolphins.gml')
dol_adj = nx.to_numpy_matrix(dol)
pageRank(dol_adj,s=.85)
它返回一个 pagerank 向量,每个节点都具有相同的值,即使应该有一个排名。
array([0.01612903, 0.01612903, 0.01612903, 0.01612903, 0.01612903,
0.01612903, 0.01612903, 0.01612903, 0.01612903, 0.01612903,
0.01612903, 0.01612903, 0.01612903, 0.01612903, 0.01612903,
0.01612903, 0.01612903, 0.01612903, 0.01612903, 0.01612903,
0.01612903, 0.01612903, 0.01612903, 0.01612903, 0.01612903,
0.01612903, 0.01612903, 0.01612903, 0.01612903, 0.01612903,
0.01612903, 0.01612903, 0.01612903, 0.01612903, 0.01612903,
0.01612903, 0.01612903, 0.01612903, 0.01612903, 0.01612903,
0.01612903, 0.01612903, 0.01612903, 0.01612903, 0.01612903,
0.01612903, 0.01612903, 0.01612903, 0.01612903, 0.01612903,
0.01612903, 0.01612903, 0.01612903, 0.01612903, 0.01612903,
0.01612903, 0.01612903, 0.01612903, 0.01612903, 0.01612903,
0.01612903, 0.01612903])
我不确定是什么原因造成的。我最初的想法是,这是由于具有 62 个节点的图的大小。
我尝试在较小的图表上运行它,并且能够返回不同值的页面排名向量。
G = np.array([[0,0,1,0,0,0,0],
[0,1,1,0,0,0,0],
[1,0,1,1,0,0,0],
[0,0,0,1,1,0,0],
[0,0,0,0,0,0,1],
[0,0,0,0,0,1,1],
[0,0,0,1,1,0,1]])
pageRank(G,s=.85)
不同的价值观:
array([0.12924088, 0.03831991, 0.12443595, 0.22384607, 0.28468494,
0.03831991, 0.16115233])
但是我不确定如何更改代码以使其可以处理更大的矩阵。无论矩阵的大小如何,该实现都应该有效。
解决方案
推荐阅读
- python - 读取两个关键字 Python 之间的行
- c++ - std::atomic 是如何实现的
- localization - 微服务国际化和本地化的最佳实践
- azure - 在 HTTP POST 中的 Azure 逻辑应用程序中出现错误:-“BadRequest。Http 请求失败:内容不是有效的 JSON。”
- php - LOAD DATA LOCAL INFILE 通过终端工作,但不是 cron 作业
- python - 在python中,如何解析多层JSON?
- python - 蟒蛇 | MySQL | AttributeError:模块'mysql.connector'没有属性'connect'
- .net - VB NET 将值与列表中的值进行比较
- c - 为 char 数组实现 getLength-Function 时出现特定情况的奇怪错误
- javascript - iframe src 在按钮单击时更改