首页 > 解决方案 > 遍历 excel 文件,将列添加到系列中,并使用 Panda 库保存结果

问题描述

使用 Python 的 Panda 库,如何遍历一个 excel 文件,向 Series 添加一列,然后将结果写入文件?下面是我的尝试,但是当我附加到系列时,列会变成行。

import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile
import re

df1 = pd.read_excel('file1.xlsx', sheet_name='Sheet1')
df2 = pd.read_excel('file2.xlsx', sheet_name='Sheet1')
writer = pd.ExcelWriter('Export.xlsx')

for index1, row1 in df1.iterrows():
    account1= str(row1['ACCOUNT1'])
    not_found = 1
    for index2, row2 in df2.iterrows():
        account2= str(row2['ACCOUNT2'])
        if re.search(account1, account2, re.IGNORECASE):
            row1["Results"] = "Found"
            not_found = 0   
            counter+=1
            print("found " + counter)
            data_writer = row1.append(row1)
            break
        if not_found ==1:
            row1["Results"] = "Found"
            counter += 1
            print("not found " + counter)
            data_writer = row1.append(row1)
data_writer.to_excel(writer,'Sheet1')
writer.save()
print("Finished")

以下是输入文件和预期输出:

在此处输入图像描述

在此处输入图像描述

标签: pythonpandas

解决方案


我想通了,但我只是希望有一种更短的方法可以在 DataFrame 中附加现有数据,而无需键入所有列。我只想在数据框中添加一个“结果”列以找出匹配的列。

df1 = pd.read_excel('file1.xlsx', sheet_name='Sheet1')
df2 = pd.read_excel('file2.xlsx', sheet_name='Sheet1')
writer = pd.ExcelWriter('Export.xlsx')

counter =0
new_dataframe = pd.DataFrame()
for index1, row1 in df1.iterrows():
    account1= str(row1['ACCOUNT1'])
    not_found = 1
    for index2, row2 in df2.iterrows():
        account2= str(row2['ACCOUNT2'])
        if re.search(account1, account2, re.IGNORECASE):
            not_found = 0   
            counter+=1
            print("found " + str(counter))
            new_dataframe = new_dataframe.append(pd.DataFrame({'Results': "Found",
            'ACCOUNT1': account1,
            'customer':row1['customer'],
            'state':row1['state'],
            'city':row1['city'] },
            index=[0]),
            ignore_index=True)
            break
    if not_found ==1:
        counter += 1
        print("not found " + str(counter))
        new_dataframe = new_dataframe.append(pd.DataFrame({'Results': "Not Found",
        'ACCOUNT1': account1,
        'customer':row1['customer'],
        'state':row1['state'],
        'city':row1['city'] }, index=[0]), ignore_index=True)           

new_dataframe.to_excel(writer,'Sheet1')
writer.save()
print("Finished")

推荐阅读