首页 > 解决方案 > 将变量传递给 pandas .loc 函数以检查循环中的多个条件

问题描述

我已经有一段时间没有在这里问过问题了,但是我现在正在尝试学习 Python,并且遇到了以下问题,我似乎无法解决。

我有一个名为 My_File_Details.xlsx 的 excel 文件,其中有两列名为 Name & DOB,用于显示手动输入的人名和出生日期。然后我有另一个文件 DATA.xlsx 包含两个也有两列名为 Name & DOB 的正确数据。我想检查 My_File_Details.xlsx 文件,通过检查 DATA.xlsx 文件来查找错误插入 DOB 的行。如果有错误,我想将该行附加到列表中以输出到另一个 excel 文件。

我可以使用带有单独行的 Pandas .loc 函数来传递 DATA.xlsx 文件中的值

但是,当我尝试使用 For 循环遍历 DATA.xlsx 文件以每次更新值时,.loc 函数会找到第一个错误并错过其余的错误,即使它可以在循环之外找到它们。

问题是我有许多其他错误要检查,所以不使用循环,这将非常麻烦。

我的代码在下面,它只会让我为 DATA 文件中的值放置一个指向表的链接。

提前感谢您提供解决此问题的任何帮助。

import pandas as pd
import numpy as np
import openpyxl
from openpyxl import Workbook
from openpyxl import load_workbook

df = pd.read_excel("My_File_Details.xlsx", header=0 # This file contains names & dates that need to be checked and has columns called “Name” & “DOB”
df2 = pd.read_excel("DATA.xlsx",sheet_name=None, header=0) # This file contains the correct names and dates in a tab called Details that I want check df against to find errors


check_Error = [] # Empty error list 
output_Errors = [] # List for output errors

#Without using a  loop
# The 1st, 3rd & 4th below have errors planted and this method finds them and appends all of them correctly
check_Error = df.loc[(df["Name"] == df2["Details"]["Name"][0]) & (df["DOB"] != df2["Details"]["Birth Date"][0])]
check_Error1 = df.loc[(df["Name"] == df2["Details"]["Name"][1]) & (df["DOB"] != df2["Details"]["Birth Date"][1])]
check_Error 2= df.loc[(df["Name"] == df2["Details"]["Name"][2] & (df["DOB"] != df2["Details"]["Birth Date"][2])]
check_Error 3= df.loc[(df["Name"] == df2["Details"]["Name"][3]) & (df["DOB"] != df2["Details"]["Birth Date"][3])]

# Append the row for each of the above if an error exists
output_Errors.append(check_Error)
output_Errors.append(check_Error1)
output_Errors.append(check_Error2)
output_Errors.append(check_Error3)


output_Errors = pd.DataFrame(np.row_stack(output_Errors)) # Change the list into a DataFrame and adapt the shape depending on how many error rows are found
output_Errors.to_excel("Output.xlsx") # Output the errors to excel



# With loop
#When I try to implement the same in a For loop, it finds the first error but returns Empty DataFrame for the rest


for i in df2["Details"].index: # Loop through df2 to set the values for Name & Birth Date each iteration
    Name = df2["Details"]["Name"][i] # Update the name variable
    DOB = df2["Details"]["Birth Date"][i] # Update the DOB variable
    check_Error = df.loc[(df["Name"] == Name) & (df["DOB"] != DOB)] # If the name is equal to the Name variable but the DOB in df doesn’t match the DOB variable then this is an error
    output_Errors.append(check_Error) # Append the error row to the list

output_Errors = pd.DataFrame(np.row_stack(output_Errors)) # Change the list into a DataFrame and adapt the shape depending on how many error rows are found
output_Errors.to_excel("Output.xlsx") # Output the errors to excel

标签: pythonexcelpandasfor-loop.loc

解决方案


推荐阅读