首页 > 解决方案 > 我需要比较两个 df 的匹配和不匹配,如果不匹配,我还需要确定哪个答案来自主 df

问题描述

我在 python 中有两个数据框,想比较两者以查找匹配项和不匹配项。重要的是,我可以在不匹配中确定哪个答案来自主答题纸,哪个答案来自用户答案。

我决定使用 pandas df.where 函数来实现这一点,除了能够识别哪个答案来自主答题纸以及哪个答案来自用户答案不匹配的情况外,它仍然有效。

# I have a DataFrame called df_master (master answer sheet)

import pandas as pd

df_master = pd.DataFrame({'B0': [1, 0, 0, 0, 0, 1],
            'B1': [0, 0, 0, 0, 1, 0],
            'B2': [0, 1, 0, 0, 0, 0],
            'B3': [0, 0, 1, 0, 0, 0],
            'B4': [0, 0, 0, 1, 0, 0]})
print(df_master)

#    B0  B1  B2  B3  B4
# 0   1   0   0   0   0
# 1   0   0   1   0   0
# 2   0   0   0   1   0
# 3   0   0   0   0   1
# 4   0   1   0   0   0
# 5   1   0   0   0   0

# I also have a DataFrame called df_answers (users answers)

df_answers = pd.DataFrame({'B0': [0, 0, 0, 0, 0, 1],
            'B1': [1, 0, 0, 0, 1, 0],
            'B2': [0, 0, 0, 0, 0, 0],
            'B3': [0, 1, 1, 0, 0, 0],
            'B4': [0, 0, 0, 1, 0, 0]})

print(df_answers)

#    B0  B1  B2  B3  B4
# 0   0   1   0   0   0
# 1   0   0   0   1   0
# 2   0   0   0   1   0
# 3   0   0   0   0   1
# 4   0   1   0   0   0
# 5   1   0   0   0   0

# when I compare the the two df's, for each match, matches correctly, where there
# is no match I have used other=2.  However this is a problem as I cannot see which is
# the correct answer.  It would be great if there was a way to work the code to reflect
# the master as a 3 and the incorrect answer from the users to stay 2?

comparison = df_master.where(df_master.values==df_answers.values, other=2)

print(comparison)

# My Results

#    B0  B1  B2  B3  B4
# 0   2   2   0   0   0
# 1   0   0   2   2   0
# 2   0   0   0   1   0
# 3   0   0   0   0   1
# 4   0   1   0   0   0
# 5   1   0   0   0   0

# Expected Results

#    B0  B1  B2  B3  B4
# 0   3   2   0   0   0
# 1   0   0   3   2   0
# 2   0   0   0   1   0
# 3   0   0   0   0   1
# 4   0   1   0   0   0
# 5   1   0   0   0   0

标签: pythonpandas

解决方案


在您使用replaceafter str sum , ps 的情况下:您自己定义映射,例如 {'00':'both failed', '01': 'master failed'...}

(df_answers.astype(str)+df_master.astype(str)).replace({'00':0,'01':3,'10':2,'11':1})
Out[129]: 
   B0  B1  B2  B3  B4
0   3   2   0   0   0
1   0   0   3   2   0
2   0   0   0   1   0
3   0   0   0   0   1
4   0   1   0   0   0
5   1   0   0   0   0

推荐阅读