首页 > 解决方案 > 如何从python数据框1中获取不匹配的记录,而不是从python中的数据框2中获取不匹配的记录

问题描述

import pandas as pd

df1 = pd.DataFrame({ 'Date':['2013-11-24','2013-11-24','2013-11-24','2013-11-24', '2021-12-21'], 'Fruit':['Banana','Orange','Apple','Celery','hello'], 'Num':[22.1,8.6,7.6,10.2, 3.67], 'Color':['Yellow','Orange','Green','Green', 'red'], })

df2 = pd.DataFrame({ 'Date':['2013-11-24','2013-11-24','2013-11-24','2013-11-24','2013-11-25','2013-11-25'], 'Fruit':['Banana','Orange','Apple','Celery','Apple','Orange'], 'Num':[22.1,8.6,7.6,10.2,22.1,8.6], 'Color':['Yellow','Orange','Green','Green','Red','Orange'], })

下面的代码给出了来自 2 个数据帧的不匹配记录,但我只想要来自一个数据帧的不匹配记录

df = pd.concat([df1, df2]) 
df = df.reset_index(drop = True) 
df_grpby = df.groupby(list(df.columns)) 
idx = [x[0] for x in df_grpby.groups.values() if len(x) == 1] 
df = df.reindex(idx) 
print(df)

标签: pythonpandas

解决方案


以下是如何使用pandas.DataFrame.apply()and pandas.DataFrame.isin()


import pandas as pd

df1 = pd.DataFrame({'Date': ['2013-11-24', '2013-11-24', '2013-11-24','2013-11-24', '2021-12-21'],
                    'Fruit': ['Banana', 'Orange', 'Apple', 'Celery', 'hello'],
                    'Num': [22.1, 8.6, 7.6, 10.2, 3.67],
                    'Color': ['Yellow', 'Orange', 'Green', 'Green', 'red']})

df2 = pd.DataFrame({'Date': ['2013-11-24', '2013-11-24', '2013-11-24','2013-11-24','2013-11-25','2013-11-25'],
                    'Fruit': ['Banana', 'Orange', 'Apple', 'Celery', 'Apple', 'Orange'],
                    'Num': [22.1, 8.6, 7.6, 10.2, 22.1, 8.6],
                    'Color': ['Yellow', 'Orange', 'Green', 'Green', 'Red', 'Orange']})

print(df1[~df1.apply(tuple, 1).isin(df2.apply(tuple, 1))])
print(df2[~df2.apply(tuple, 1).isin(df1.apply(tuple, 1))])

输出:

         Date  Fruit   Num Color
4  2021-12-21  hello  3.67   red
         Date   Fruit   Num   Color
4  2013-11-25   Apple  22.1     Red
5  2013-11-25  Orange   8.6  Orange

推荐阅读