首页 > 解决方案 > 在 networkx DiGraph 中生成随机边

问题描述

我有一个nx.DiGraph具有 1000000 个节点和 1500000 个边的对象。该图是二分的,它像 A->T->A 一样,其中 A 是大于 99602440 的数字,T 是较小的数字。大约有相同数量的T和A。

我想生成另一个具有相同数量节点但随机边的图。边应该是从原始图的入出度分布中提取出来的。

这是我生成的代码:

G=nx.read_weighted_edgelist("C:/Users/uccio/Desktop/tesi2/mygraphUpTo_2011_5_13", create_using=nx.DiGraph,nodetype=int)
#my original graph

def cumul(sort): #create cumulative frequencies
    ind=[]
    val=[]
    c=0
    for i, j in sort:
        ind.append(i)
        c+=j
        val.append(c)
    return ind, val


def exCum(cum_index, cum_value, sumtot): #binary search in the cumulative
    coin=int(random.randint(0,sumtot)) #extract random value
    val=np.searchsorted(cum_value,coin,side="left")
    return cum_index[val] #return key of the dictionary of cumulated frequencies
    
    


distin={} #dictionary of in-degree frequencies
distout={} #dictionary of out-degree frequencies
for f in G.nodes():
    if f<99602440:
        x=G.in_degree(f)
        y=G.out_degree(f)
        if x in distin:
            distin[x]+=1
        else:
            distin[x]=1
        if y in distout:
            distout[y]+=1
        else:
            distout[y]=1
sort_in=sorted(distin.items()) #ordered to do binary search in the dict
sort_out=sorted(distout.items())

cumin_index, cumin_val= cumul(sort_in) #cumulative value todo binary search in
cumout_index, cumout_val= cumul(sort_out)
sumtot_in=sum(distin.values())
sumtot_out=sum(distout.values())

test=nx.DiGraph()
test.add_nodes_from(G.nodes()) #my new graph
trans=[] #extracted T from the graph
add=[] #extracted A from the graph
for i in test.nodes():
     if i<99602440:
         trans.append(i)
     else:
         add.append(i)

for t in trans:
    
    ind=exCum(cumin_index, cumin_val, sumtot_in) #binary search return the key of the dictionary
    outd=exCum(cumout_index, cumout_val, sumtot_out)
    extin=list(random.choice(add,ind)) #choose random nodes from A to use as input
#it uses ind to choose how many A->T should exist
    extout=list(random.choice(add,outd)) #choose random nodes from A to use as output
#it uses ind to choose how many T->A should exist
    for inv in extin:
        test.add_weighted_edges_from([(inv,t,0)]) #creates the edges A->T for that T                
    for outv in extout:
        test.add_weighted_edges_from([(t,outv,0)])  #creates the edges T->A for that T

代码有效,但最后一部分(来自for t in trans:)需要很长时间(肯定长达 5 小时,但我在计算机工作时上床睡觉,所以可能会更长)。有没有办法让一切变得更快?

标签: pythonnetworkxgraph-theory

解决方案


推荐阅读