首页 > 解决方案 > 如何先过滤数据框,然后匹配熊猫中的相同数据框?

问题描述

我有两个数据框。df1 和 df2。我想根据 df1 的值过滤 df2 ,然后计算值的频率并再次与 df1 进行比较。

例子:

df1:

Project_Number
S100
S100
S200
S300
S300
S300
S400
S400

df2:

Project_Number
S100
S200
S200
S300
S300
S300
S500

现在首先根据 df1 的值过滤 df2。仅保留 df1 中存在的那些值。

df2_new:

Project_Number
S100
S200
S200
S300
S300
S300

现在取两个数据帧的频率-

df1['Count'] = df1['Project_Number'].map(df1['Project_Number'].value_counts())
df2_new['Count'] = df2_new['Project_Number'].map(df2_new['Project_Number'].value_counts())

df1-                              df2_new
Project_Number  Count             Project_Number   Count
S100            2                 S100             1
S200            1                 S200             2
S300            3                 S300             3
S400            2

现在取上述 2 个数据帧之间的差异并打印结果-

df_difference-

Project_Number  
S100           
S200
S400  

标签: pythonpython-3.xpandasdataframe

解决方案


要过滤,请尝试使用isin

df2_new = df2[df2["Project_Number"].isin(df1["Project_Number"].unique().tolist())]

推荐阅读