python - 在 networkx DiGraph 中生成随机边
问题描述
我有一个nx.DiGraph
具有 1000000 个节点和 1500000 个边的对象。该图是二分的,它像 A->T->A 一样,其中 A 是大于 99602440 的数字,T 是较小的数字。大约有相同数量的T和A。
我想生成另一个具有相同数量节点但随机边的图。边应该是从原始图的入出度分布中提取出来的。
这是我生成的代码:
G=nx.read_weighted_edgelist("C:/Users/uccio/Desktop/tesi2/mygraphUpTo_2011_5_13", create_using=nx.DiGraph,nodetype=int)
#my original graph
def cumul(sort): #create cumulative frequencies
ind=[]
val=[]
c=0
for i, j in sort:
ind.append(i)
c+=j
val.append(c)
return ind, val
def exCum(cum_index, cum_value, sumtot): #binary search in the cumulative
coin=int(random.randint(0,sumtot)) #extract random value
val=np.searchsorted(cum_value,coin,side="left")
return cum_index[val] #return key of the dictionary of cumulated frequencies
distin={} #dictionary of in-degree frequencies
distout={} #dictionary of out-degree frequencies
for f in G.nodes():
if f<99602440:
x=G.in_degree(f)
y=G.out_degree(f)
if x in distin:
distin[x]+=1
else:
distin[x]=1
if y in distout:
distout[y]+=1
else:
distout[y]=1
sort_in=sorted(distin.items()) #ordered to do binary search in the dict
sort_out=sorted(distout.items())
cumin_index, cumin_val= cumul(sort_in) #cumulative value todo binary search in
cumout_index, cumout_val= cumul(sort_out)
sumtot_in=sum(distin.values())
sumtot_out=sum(distout.values())
test=nx.DiGraph()
test.add_nodes_from(G.nodes()) #my new graph
trans=[] #extracted T from the graph
add=[] #extracted A from the graph
for i in test.nodes():
if i<99602440:
trans.append(i)
else:
add.append(i)
for t in trans:
ind=exCum(cumin_index, cumin_val, sumtot_in) #binary search return the key of the dictionary
outd=exCum(cumout_index, cumout_val, sumtot_out)
extin=list(random.choice(add,ind)) #choose random nodes from A to use as input
#it uses ind to choose how many A->T should exist
extout=list(random.choice(add,outd)) #choose random nodes from A to use as output
#it uses ind to choose how many T->A should exist
for inv in extin:
test.add_weighted_edges_from([(inv,t,0)]) #creates the edges A->T for that T
for outv in extout:
test.add_weighted_edges_from([(t,outv,0)]) #creates the edges T->A for that T
代码有效,但最后一部分(来自for t in trans:
)需要很长时间(肯定长达 5 小时,但我在计算机工作时上床睡觉,所以可能会更长)。有没有办法让一切变得更快?
解决方案
推荐阅读
- c++ - RE2 不匹配非 ASCII 字符
- c++ - antlr4中对Any的要求
- cucumber - Cucumber Puppeteer:尝试使用参数调用 step 函数时出错
- python - Django重命名模型字段而没有数据库上的ALTER TABLE
- c# - EF Core:视图的导航属性
- reactjs - react redux中如何将store传递给子组件?
- javascript - 如何更改由 JavaScript 函数生成的 docx 中的字体
- sql-server - 在多列上具有多个条件的情况
- java - Java机器人类导出后不起作用【已解决】【原因:窗口缩放】
- unity3d - Unity 中 OnCollisionStay 的 CharacterController 等价物是什么?