python - 如何从熊猫数据框创建多关系边缘列表?
问题描述
我有一个这样的熊猫数据框:
from itertools import *
from pandas as pd
d = {'col1': ['a', 'b','c','d','a','b','d'], 'col2': ['XX','XX','XY','XX','YY','YY','XY']}
df_rel = pd.DataFrame(data=d)
df_rel
col1 col2
0 a XX
1 b XX
2 c XY
3 d XX
4 a YY
5 b YY
6 d XY
唯一节点是:
uniq_nodes = df_rel['col1'].unique()
uniq_nodes
array(['a', 'b', 'c', 'd'], dtype=object)
对于每个Relationship
源 (Src) 和目标 (Dst) 都可以生成:
df1 = pd.DataFrame(
data=list(combinations(uniq_nodes, 2)),
columns=['Src', 'Dst'])
df1
Src Dst
0 a b
1 a c
2 a d
3 b c
4 b d
5 c d
我需要newdf
基于 中的共享元素col2
的新数据框df_rel
。该Relationship
列来自col2
. 因此,带有 edgelist 的期望数据帧将是:
newdf
Src Dst Relationship
0 a b XX
1 a b YY
2 a d XX
3 c d XY
有没有最快的方法来实现这一目标?原始数据框有 30,000 行。
解决方案
You need to loop through your df1
rows, and find the rows from df_rel
that matches the df1['Src']
and df1['Dst']
columns. Once you have the df1['col2']
values of Src
and Dst
, compare them and if they match create a row in newdf
. Try this - check if it performs for large datasets
Data setup (same as yours):
d = {'col1': ['a', 'b', 'c', 'd', 'a', 'b', 'd'], 'col2': ['XX', 'XX', 'XY', 'XX', 'YY', 'YY', 'XY']}
df_rel = pd.DataFrame(data=d)
uniq_nodes = df_rel['col1'].unique()
df1 = pd.DataFrame(data=list(combinations(uniq_nodes, 2)), columns=['Src', 'Dst'])
Code:
newdf = pd.DataFrame(columns=['Src','Dst','Relationship'])
for i, row in df1.iterrows():
src = (df_rel[df_rel['col1'] == row['Src']]['col2']).to_list()
dst = (df_rel[df_rel['col1'] == row['Dst']]['col2']).to_list()
for x in src:
if x in dst:
newdf = newdf.append(pd.Series({'Src': row['Src'], 'Dst': row['Dst'], 'Relationship': x}),
ignore_index=True, sort=False)
print(newdf)
Result:
Src Dst Relationship
0 a b XX
1 a b YY
2 a d XX
3 b d XX
4 c d XY
推荐阅读
- arrays - python list.sort() 函数实际上是如何工作的?
- angular - 如何在 Angular 响应式表单中访问多个 FormGroup
- prometheus - 配置 prometheus 目标以从文本文件中读取指标
- c - **标识符 msqid 已从系统中删除** 含义
- c - 为什么这些 while 循环不混合我给它们的数字?
- c++ - “错误:数组下标的无效类型'float [10001] [float]'”是什么意思?
- google-apps-script - 谷歌表格>新条目在顶部弹出
- javascript - 在创建新用户之前检查用户是否存在 - Firebase Auth
- c++ - 如何删除用户输入字符串末尾的空字符?
- python - 我正在尝试批量创建按钮,但命令功能不想工作