首页 > 解决方案 > 如何修复有向图,使最顶层的父级始终是第一级?

问题描述

我有以下熊猫数据框:

parent, child 
40066, 50106
30029, 40066
40066, 50100
40066, 50106
50106, 60061
50106, 60063
50106, 60062
50100, 60057
50100, 60058

我正在尝试达到以下结构:

child, level1, level2, level3
60063, 30029, 40066, 50106
60062, 30029, 40066, 50106
60061, 30029, 40066, 50106
60058, 30029, 40066, 50100
60057, 30029, 40066, 50100

我使用了以下代码:

import pandas as pd
import networkx as nx 
df = (see above) 
leaves = set(df.child).difference(set(df.parent))
g = nx.from_pandas_edgelist(df, 'parent', 'child', create_using=nx.DiGraph(), edge_attr=True)
ancestors = {n: nx.algorithms.dag.ancestors(g,n) for n in leaves{ 
df2 = pd.DataFrame.from_disc(ancestors, orient='index')

这给了我这个输出:

60063, 30029, 50106, 40066
60062, 30029, 50106, 40066
60061, 30029, 50106, 40066
60058, 30029, 50100, 40066
60057, 50100, 40066, 30029

这是不正确的(前 4 行应该有关联 30029 -> 400600 -> ...),最后一行的顺序完全错误。

标签: pythonpandas

解决方案


用于nx.all_simple_paths查找从根到叶的所有路径。

完整代码:

import pandas as pd
import numpy as np
import networkx as nx


# setup initial data
df = pd.DataFrame({'parent': [40066, 30029, 40066, 40066,
                              50106, 50106, 50106, 50100, 50100],
                   'child': [50106, 40066, 50100, 50106,
                             60061, 60063, 60062, 60057, 60058]})
g = nx.from_pandas_edgelist(df, 'parent', 'child', create_using=nx.DiGraph)

# get leaves and roots
leaves = [node for node, degree in g.out_degree() if degree == 0]
roots = [node for node, degree in g.in_degree() if degree == 0]

# find all paths
paths = []
for root in roots :
  for leaf in leaves :
    for path in nx.all_simple_paths(g, root, leaf):
        paths.append(path)

# create dataframe
df1 = pd.DataFrame(np.roll(paths, shift=1))
df1 = df1.add_prefix('level').rename(columns={'level0': 'child'})

输出:

>>> df1
   child  level1  level2  level3
0  60058   30029   40066   50106
1  60061   30029   40066   50106
2  60063   30029   40066   50106
3  60062   30029   40066   50100
4  60057   30029   40066   50100

推荐阅读