首页 > 解决方案 > 两个长度不等的数据帧的交集

问题描述

如果行匹配,我正在尝试获取“游戏”和“样本”数据帧之间的交集。数据框的大小不等,我不希望将一行计算两次以进行交集。

例如,样本数据框有行[0,1,1],[1,1,0],[1,0,1],[0,1,1]

并且游戏数据框有 rows [1,1,0],[1,1,0],[1,0,1],[1,1,1],[1,0,1]

现在交叉数据框应该有 rows [1,1,0],[1,0,1]

import pandas as pd
import numpy as np
import random
trials = 1000
games = 3
data = pd.DataFrame()         
for i in range(trials):
    for j in range(games):
        data.loc[i,j] = random.choice([0,1])

sample = pd.DataFrame()
for i in range(trials):
    for j in range(games):
        if ((data.loc[i,:]).sum()) >= 2:
            sample.loc[i,j] = data.loc[i,j]

game = pd.DataFrame()
for i in range(trials):
    for j in range(games):
        if (data.loc[i,0]) == 1:
            game.loc[i,j] = data.loc[i,j]

intersection = pd.DataFrame()
for i in range(len(sample)):
    if np.all(sample.iloc[i,:] == game.iloc[i,:]):
        for j in range(games):
            intersection.loc[i,j] = sample.loc[i,j]


标签: pythonpandas

解决方案


您可以尝试使用 pandas pd.DataFrame.isin条件检查第二个数据框中的相似行

df1 = pd.DataFrame([[0,1,1],[1,1,0],[1,0,1],[0,1,1]])
df2 = pd.DataFrame([[1,1,0],[1,1,0],[1,0,1],[1,1,1],[1,0,1]])

df1[df1.isin(df2).all(1)]

出去:

    0   1   2
1   1   1   0
2   1   0   1

推荐阅读