首页 > 解决方案 > 使用 Python 比较两个 excel 文件

问题描述

我在两个 excel 文件中有数据,如下所示

创建的示例 DS:

df1 =  {'Transaction_Name':['SC-001_Homepage', 'SC-002_Homepage', 'SC-001_Signinlink'], 'Count': [1, 1, 2]}
df1 = pd.DataFrame(df1, columns=df1.keys())

df2 =  {'Transaction_Name':['SC-001_Homepage', 'SC-002_Homepage', 'SC-001_Signinlink'], 'Count': [2, 1, 2]}
df2 = pd.DataFrame(df2, columns=df2.keys())

请帮助我了解两者的不同之处,如下所示..

Transaction_name    Count_df1        Count_df2
SC-001_Homepage          1              2
SC-001_Homepage          1              1
SC-001_Homepage          2              2

输出计数的第一行不匹配。我可以用不同的颜色突出显示吗?示例代码如下

#COmparing both excels
df1 = pd.read_csv(r"WLMOUTPUT.csv", dtype=object)
df2 = pd.read_csv(r"results.csv", dtype=object)

print('\n', df1)
print('\n',df2)

df1['Compare'] = df1['Transaction_Name'] + df1['Count'].astype(str)
df2['Compare'] = df2['Transaction_Name'] + df2['Count'].astype(str)

print('\n', df1.loc[~df1['Compare'].isin(df2['Compare'])])

提前致谢

标签: pythonpandas

解决方案


您可以使用该merge功能。

import pandas as pd

df1 = pd.DataFrame({'Transaction_Name':['SC-001_Homepage', 'SC-002_Homepage', 'SC-001_Signinlink'], 'Count': [1, 1, 2]}) 
df2 = pd.DataFrame({'Transaction_Name':['SC-001_Homepage', 'SC-002_Homepage', 'SC-001_Signinlink'], 'Count': [2, 1, 2]})

merged_df = pd.merge(df1, df2, on = 'Transaction_Name', suffixes=('_df1', '_df2'))

这将为您提供此 DataFrame:

print(merged_df)

   Count_df1   Transaction_Name  Count_df2
0          1    SC-001_Homepage          2
1          1    SC-002_Homepage          1
2          2  SC-001_Signinlink          2

然后您可以使用子集来查看哪些行具有不同的计数:

diff = merged_df[merged_df['Count_df1'] != merged_df['Count_df2']]

你会得到这个:

print(diff)

   Count_df1 Transaction_Name  Count_df2
0          1  SC-001_Homepage          2

推荐阅读