首页 > 解决方案 > Python PageRank 实现页面排名向量对每个节点收敛到相同的值

问题描述

下面是一个 PageRank 的实现。

import numpy as np
from scipy.sparse import csc_matrix
import networkx as nx

def pageRank(G, s = .85, maxerr = 0.0001):
    """
    Computes the pagerank for each of the n states
    Parameters
    ----------
    G: matrix representing state transitions
       Gij is a binary value representing a transition from state i to j.
    s: probability of following a transition. 1-s probability of teleporting
       to another state.
    maxerr: if the sum of pageranks between iterations is bellow this we will
            have converged.
    """
    n = G.shape[0]

    # transform G into markov matrix A
    A = csc_matrix(G,dtype=np.float)
    rsums = np.array(A.sum(1))[:,0]
    ri, ci = A.nonzero()
    A.data /= rsums[ri]

    # bool array of sink states
    sink = rsums==0

    # Compute pagerank r until we converge
    ro, r = np.zeros(n), np.ones(n)
    while np.sum(np.abs(r-ro)) > maxerr:
        ro = r.copy()
        # calculate each pagerank at a time
        for i in range(0,n):
            # inlinks of state i
            Ai = np.array(A[:,i].todense())[:,0]
            # account for sink states
            Di = sink / float(n)
            # account for teleportation to state i
            Ei = np.ones(n) / float(n)

            r[i] = ro.dot( Ai*s + Di*s + Ei*(1-s) )

    # return normalized pagerank
    return r/float(sum(r))

我正在尝试在这个数据集上运行这个算法。

数据集是从密歇根大学数据集页面获取的海豚社交网络图。

我已检索数据并将其转换为邻接矩阵,然后将该算法应用于邻接矩阵

dol = nx.read_gml('dolphins.gml')

dol_adj = nx.to_numpy_matrix(dol)

pageRank(dol_adj,s=.85)

它返回一个 pagerank 向量,每个节点都具有相同的值,即使应该有一个排名。

array([0.01612903, 0.01612903, 0.01612903, 0.01612903, 0.01612903,
       0.01612903, 0.01612903, 0.01612903, 0.01612903, 0.01612903,
       0.01612903, 0.01612903, 0.01612903, 0.01612903, 0.01612903,
       0.01612903, 0.01612903, 0.01612903, 0.01612903, 0.01612903,
       0.01612903, 0.01612903, 0.01612903, 0.01612903, 0.01612903,
       0.01612903, 0.01612903, 0.01612903, 0.01612903, 0.01612903,
       0.01612903, 0.01612903, 0.01612903, 0.01612903, 0.01612903,
       0.01612903, 0.01612903, 0.01612903, 0.01612903, 0.01612903,
       0.01612903, 0.01612903, 0.01612903, 0.01612903, 0.01612903,
       0.01612903, 0.01612903, 0.01612903, 0.01612903, 0.01612903,
       0.01612903, 0.01612903, 0.01612903, 0.01612903, 0.01612903,
       0.01612903, 0.01612903, 0.01612903, 0.01612903, 0.01612903,
       0.01612903, 0.01612903])

我不确定是什么原因造成的。我最初的想法是,这是由于具有 62 个节点的图的大小。

我尝试在较小的图表上运行它,并且能够返回不同值的页面排名向量。

G = np.array([[0,0,1,0,0,0,0],
                  [0,1,1,0,0,0,0],
                  [1,0,1,1,0,0,0],
                  [0,0,0,1,1,0,0],
                  [0,0,0,0,0,0,1],
                  [0,0,0,0,0,1,1],
                  [0,0,0,1,1,0,1]])
pageRank(G,s=.85)

不同的价值观:

array([0.12924088, 0.03831991, 0.12443595, 0.22384607, 0.28468494,
       0.03831991, 0.16115233])

但是我不确定如何更改代码以使其可以处理更大的矩阵。无论矩阵的大小如何,该实现都应该有效。

标签: pythonnumpyscipynetworkxpagerank

解决方案


推荐阅读