首页 > 解决方案 > 如何从熊猫数据框创建多关系边缘列表?

问题描述

我有一个这样的熊猫数据框:

 from itertools import * 
 from pandas as pd
 d = {'col1': ['a', 'b','c','d','a','b','d'], 'col2': ['XX','XX','XY','XX','YY','YY','XY']}
 df_rel = pd.DataFrame(data=d)
 df_rel
       col1 col2
    0   a   XX
    1   b   XX
    2   c   XY
    3   d   XX
    4   a   YY
    5   b   YY
    6   d   XY

唯一节点是:

uniq_nodes = df_rel['col1'].unique()
uniq_nodes
array(['a', 'b', 'c', 'd'], dtype=object)

对于每个Relationship源 (Src) 和目标 (Dst) 都可以生成:

df1 = pd.DataFrame(
    data=list(combinations(uniq_nodes, 2)), 
    columns=['Src', 'Dst'])
df1
  Src   Dst
0   a   b
1   a   c
2   a   d
3   b   c
4   b   d
5   c   d

我需要newdf基于 中的共享元素col2的新数据框df_rel。该Relationship列来自col2. 因此,带有 edgelist 的期望数据帧将是:

newdf

   Src  Dst Relationship
0   a   b   XX
1   a   b   YY
2   a   d   XX
3   c   d   XY

有没有最快的方法来实现这一目标?原始数据框有 30,000 行。

标签: pythonpandasperformancedataframe

解决方案


You need to loop through your df1 rows, and find the rows from df_rel that matches the df1['Src'] and df1['Dst'] columns. Once you have the df1['col2'] values of Src and Dst, compare them and if they match create a row in newdf. Try this - check if it performs for large datasets

Data setup (same as yours):

d = {'col1': ['a', 'b', 'c', 'd', 'a', 'b', 'd'], 'col2': ['XX', 'XX', 'XY', 'XX', 'YY', 'YY', 'XY']}
df_rel = pd.DataFrame(data=d)

uniq_nodes = df_rel['col1'].unique()

df1 = pd.DataFrame(data=list(combinations(uniq_nodes, 2)),  columns=['Src', 'Dst'])

Code:

newdf = pd.DataFrame(columns=['Src','Dst','Relationship'])
for i,  row in df1.iterrows():
    src = (df_rel[df_rel['col1'] == row['Src']]['col2']).to_list()
    dst = (df_rel[df_rel['col1'] == row['Dst']]['col2']).to_list()
    for x in src:
        if x in dst:
            newdf = newdf.append(pd.Series({'Src': row['Src'], 'Dst': row['Dst'], 'Relationship': x}),
                                 ignore_index=True, sort=False)

print(newdf)

Result:

  Src Dst Relationship
0   a   b           XX
1   a   b           YY
2   a   d           XX
3   b   d           XX
4   c   d           XY

推荐阅读