python - 如何使用 pandas 将文件名映射到从多个 excel 文件中提取的列
问题描述
我正在尝试从多个 excel 文件中提取所有列,然后将文件名映射到每个提取的列,但是我正在努力解决“TypeError:索引不支持可变操作”。
下面是我的两个文件:
Fund_Data.xlsx:
FUND ID FUND NAME AMOUNT client code Price description Trade Date Trade Datetime
0 10101 Holdings company A 10000.5 1234 124.3 abcd 2020-08-19 2020-08-19 12:30:00
1 20202 Holdings company B -2000.5 192 -24.2 abcd 2020-08-20 2020-08-20 12:30:00
2 30303 Holdings company C 3000.5 123 192 NaN 2020-08-21 2020-08-21 12:30:00
3 10101 Holdings company A 10000 1234567 5.5 NaN 2020-08-22 2020-08-22 12:30:00
4 20202 Holdings company B 10000.5 9999 3.887 abcd 2020-08-23 2020-08-23 12:30:00
Stocks.xlsx
ID STOCK VALUE
1 3i 100
2 Admiral Group 200
3 Anglo American 300
4 Antofagasta 100
5 Ashtead 200
6 Associated British Foods 300
7 AstraZeneca 400
8 Auto Trader Group 500
9 Avast 600
到目前为止,这是我的代码:
import pandas as pd
from os import walk
f = []
directory = 'C:/Users/rrai020/Documents/Python Scripts/DD'
for (dirpath, dirnames, filenames) in os.walk(directory):
for x in filenames:
if x.endswith('xlsx'):
f.append(x)
#f = ['Fund_Data.xlsx', 'Stocks.xlsx'] created a list from filenames in directory ^^^
data = pd.DataFrame() # initialize empty df
for filename in f:
df = pd.read_excel(filename, dtype = object, ignore_index=True).columns # read in each excel to df
df['filename'] = filename # add a column with the filename
data = data.append(df) # add all small df's to big df
print(data)
我正在尝试实现以下输出(或类似输出):
Field Name Filename
FUND ID Fund_Data.xlsx
FUND NAME Fund_Data.xlsx
AMOUNT Fund_Data.xlsx
client code Fund_Data.xlsx
Price Fund_Data.xlsx
description Fund_Data.xlsx
Trade Date Fund_Data.xlsx
Trade Datetime Fund_Data.xlsx
Trade time Fund_Data.xlsx
ID Stocks.xlsx
STOCK Stocks.xlsx
VALUE Stocks.xlsx
我希望代码灵活,以便它可以用于我这里的 2 个以上的文件。抱歉,如果这是微不足道的,我仍在学习!
解决方案
问题在于您要附加的数据框。我们需要为循环内的每个文件创建一个包含Field Name
,Filename
列的数据框,然后将其附加到data
.
这是一个选项:
data = pd.DataFrame()
for filename in f:
# read in each excel to df
df = pd.read_excel(filename, dtype = object, ignore_index=True).columns
# create a dataframe with (Field Name, Filename) columns for current file
x = pd.DataFrame({'Field Name': x.columns, 'Filename': filename})
# append to the global dataframe
data = data.append(x)
data
输出:
Field Name Filename
0 FUND ID Fund_Data.xlsx
1 FUND NAME Fund_Data.xlsx
2 AMOUNT Fund_Data.xlsx
3 client code Fund_Data.xlsx
4 Price description Fund_Data.xlsx
5 Trade Date Fund_Data.xlsx
6 Trade Datetime Fund_Data.xlsx
7 ID Stocks.xlsx
8 STOCK Stocks.xlsx
9 VALUE Stocks.xlsx
推荐阅读
- python - Pandas - 将具有列表的一列转换为不同的数据框
- amazon-ec2 - 用 ansible 调整 ec2 的类型
- apache-spark - E0401:无法在 Windows 10 的 VSCode 中导入 'pyspark
- vba - 在导入的用户表单上执行 userform.show 时需要运行时错误“424”对象
- typescript - 一个参数具有另一个参数上存在的键
- laravel - 强制 response()->download() 返回 HTTPS url
- javascript - GTM 数组返回 [object Object] 而不是产品
- css - 禁用反应额外的 div 包装器
- nosql - Cosmos DB:收集和原子计数器
- node.js - excel4node lib自动调整excel单元格以适应更长的文本