python - 遍历 XLSX 工作簿并将唯一值保存在从数据框到单个工作表的卷中
问题描述
这有点令人费解。但是我在 XLSX 中有一个输入表,我在其中使用 pandas groupby 进行转换以求和供应商的成本 [唯一值],然后对这些成本的总和进行排名和分类。然后我想遍历表,获取列中的唯一值、与该唯一值相关的值并将其保存到自己的工作表中,每个工作簿都在一个工作簿中。
我不断收到错误消息:
IndexError: string index out of range.
我不知道为什么当我尝试对汇总的 groupby 进行排名和分箱时,在尝试迭代时会出现索引错误。我的 github 上有工作簿和脚本。每个唯一供应商的最终结果将类似于下图。谢谢你们。我很感激。这里的 Github 链接:https ://github.com/beingandbrian/xlsx_supplier_scorecard_tables.git
解决方案
我试图在我的 Jupyter 笔记本中重现您的代码:
# imports
import os
from pathlib import Path
import pandas as pd
from openpyxl import load_workbook
from openpyxl.utils.dataframe import dataframe_to_rows
from more_itertools import unique_everseen
# from udfs import *
def agg_rank_bin_rename_join(df_i_want_to_agg, attributes, value):
if not isinstance(attributes, list):
attributes = [attributes]
value_dict = { key: ['sum'] for key in value } if isinstance(value, list) else { value: ['sum'] }
df_simp_arith = df_i_want_to_agg.groupby(attributes).agg(value_dict)
df_simp_arith.columns = ['Sum']
df_rank = df_simp_arith.rank(ascending=1, method='dense').add_prefix('Rank ')
bin_labels = ['Least Expensive', 'Less Expensive', 'Average', 'More Expensive', 'Most Expensive']
df_bin = df_rank.apply(lambda x: pd.qcut(x, q=[0, .2, .4, .6, .8, 1], labels=bin_labels))
output = df_simp_arith.join(df_rank).join(df_bin.add_prefix('Bin '))
output = output.copy(deep=True)
return output
def add_df_to_ws(the_ws_title_name, the_df):
ws_loaded_object = wb_loaded_object.create_sheet(title = the_ws_title_name)
for r in dataframe_to_rows(the_df, index = False, header = True):
ws_loaded_object.append(r)
for cell in ws_loaded_object['A'] + ws_loaded_object[1]:
cell.style = 'Pandas'
我正在使用没有任何操作系统导入的简单 readexcel
# read in xlsx filepath data in as a df
df_loaded = pd.read_excel(r'C:/Users/ttayyab/Downloads/input.xlsx')
# transform the df using groupby
df_groupby = agg_rank_bin_rename_join(df_loaded, ['Supplier Name', 'Paid Date FY Year'], 'Total Net Amount').reset_index()
# create a list of unique values in index attribute column
unique_values_in_attribute_column_list = list(df_groupby['Supplier Name'].unique())
#output
['Alexandra Hodge',
'Brian Lee',
'Carrie Caldwell',
'Cynthia Jones DVM',
'Danielle Leblanc',
'Destiny Swanson',
'Drew Le',
'Jason Castaneda',
'Jonathan Tran',
'Joyce Green',
'Karen Tran',
'Melissa Wang',
'Michael Nguyen',
'Mrs. Teresa Thompson DDS',
'Patrick Herrera',
'Rachel Odom',
'Vincent Coleman']
data
我创建了一个名为附加数据框值的空列表:
data = []
for i, each_unique_value in enumerate(unique_values_in_attribute_column_list):
data.append(df_groupby.loc[df_groupby['Supplier Name'] == each_unique_value])
它在 Jupyter 单元上显示如下:
[ Supplier Name Paid Date FY Year Sum Rank Sum Bin Rank Sum
0 Alexandra Hodge FY2018 590.5 2.0 Least Expensive
1 Alexandra Hodge FY2019 745.0 4.0 Least Expensive
2 Alexandra Hodge FY2020 165.0 1.0 Least Expensive,
Supplier Name Paid Date FY Year Sum Rank Sum Bin Rank Sum
3 Brian Lee FY2018 301250.03965 34.0 More Expensive
4 Brian Lee FY2019 711728.51000 40.0 More Expensive
5 Brian Lee FY2020 584431.09000 38.0 More Expensive,
Supplier Name Paid Date FY Year Sum Rank Sum Bin Rank Sum
6 Carrie Caldwell FY2018 1324015.88 45.0 Most Expensive
7 Carrie Caldwell FY2019 3613888.51 50.0 Most Expensive
8 Carrie Caldwell FY2020 3013043.44 49.0 Most Expensive,
Supplier Name Paid Date FY Year Sum Rank Sum Bin Rank Sum
9 Cynthia Jones DVM FY2018 56917.16 19.0 Less Expensive
10 Cynthia Jones DVM FY2019 179481.53 31.0 Average
11 Cynthia Jones DVM FY2020 99613.13 22.0 Average,
Supplier Name Paid Date FY Year Sum Rank Sum Bin Rank Sum
12 Danielle Leblanc FY2018 1768.76 6.0 Least Expensive
13 Danielle Leblanc FY2019 27246.60 15.0 Less Expensive
14 Danielle Leblanc FY2020 2871.86 7.0 Least Expensive,
Supplier Name Paid Date FY Year Sum Rank Sum Bin Rank Sum
15 Destiny Swanson FY2018 145049.74 27.0 Average
16 Destiny Swanson FY2019 145257.46 28.0 Average
17 Destiny Swanson FY2020 28073.53 16.0 Less Expensive,
Supplier Name Paid Date FY Year Sum Rank Sum Bin Rank Sum
18 Drew Le FY2018 26412.38 14.0 Less Expensive
19 Drew Le FY2019 125407.70 24.0 Average
20 Drew Le FY2020 132418.75 25.0 Average,
Supplier Name Paid Date FY Year Sum Rank Sum Bin Rank Sum
21 Jason Castaneda FY2018 115625.69 23.0 Average
22 Jason Castaneda FY2019 433708.13 35.0 More Expensive
23 Jason Castaneda FY2020 145646.68 29.0 Average,
Supplier Name Paid Date FY Year Sum Rank Sum Bin Rank Sum
24 Jonathan Tran FY2018 15000.0 10.0 Least Expensive
25 Jonathan Tran FY2019 16300.0 11.0 Least Expensive
26 Jonathan Tran FY2020 18900.0 12.0 Less Expensive,
Supplier Name Paid Date FY Year Sum Rank Sum Bin Rank Sum
27 Joyce Green FY2018 1526.16 5.0 Least Expensive
28 Joyce Green FY2019 9999.09 9.0 Least Expensive
29 Joyce Green FY2020 708.10 3.0 Least Expensive,
Supplier Name Paid Date FY Year Sum Rank Sum Bin Rank Sum
30 Karen Tran FY2018 23127.99 13.0 Less Expensive
31 Karen Tran FY2019 28490.21 17.0 Less Expensive
32 Karen Tran FY2020 7067.30 8.0 Least Expensive,
Supplier Name Paid Date FY Year Sum Rank Sum Bin Rank Sum
33 Melissa Wang FY2018 207494.8321 32.0 More Expensive
34 Melissa Wang FY2019 228311.6400 33.0 More Expensive
35 Melissa Wang FY2020 57821.3100 20.0 Less Expensive,
Supplier Name Paid Date FY Year Sum Rank Sum Bin Rank Sum
36 Michael Nguyen FY2018 140296.74 26.0 Average
37 Michael Nguyen FY2019 922189.38 42.0 Most Expensive
38 Michael Nguyen FY2020 552030.99 37.0 More Expensive,
Supplier Name Paid Date FY Year Sum Rank Sum \
39 Mrs. Teresa Thompson DDS FY2018 90791.41 21.0
40 Mrs. Teresa Thompson DDS FY2019 157290.15 30.0
41 Mrs. Teresa Thompson DDS FY2020 40868.99 18.0
Bin Rank Sum
39 Less Expensive
40 Average
41 Less Expensive ,
Supplier Name Paid Date FY Year Sum Rank Sum Bin Rank Sum
42 Patrick Herrera FY2018 661283.77 39.0 More Expensive
43 Patrick Herrera FY2019 2236597.92 47.0 Most Expensive
44 Patrick Herrera FY2020 1190359.61 44.0 Most Expensive,
Supplier Name Paid Date FY Year Sum Rank Sum Bin Rank Sum
45 Rachel Odom FY2018 824392.65 41.0 More Expensive
46 Rachel Odom FY2019 984224.09 43.0 Most Expensive
47 Rachel Odom FY2020 520067.29 36.0 More Expensive,
Supplier Name Paid Date FY Year Sum Rank Sum Bin Rank Sum
48 Vincent Coleman FY2018 1464104.11 46.0 Most Expensive
49 Vincent Coleman FY2019 3737305.47 51.0 Most Expensive
50 Vincent Coleman FY2020 2775609.73 48.0 Most Expensive]
推荐阅读
- c++ - 链接boost python时未定义的引用
- ios - 默认情况下,SwiftUI 文本替换在 TextField 视图上不起作用
- c - 我的代码将打印两次错误消息并仍然运行
- flutter - 如何以编程方式模拟 Flutter 中按钮的 onTap?
- regex - 使用 Regex 提取下一个语义发布版本
- php - 下拉列表中的PHP格式日期不起作用
- r - 尽管在调用中提供了值,但函数使用默认参数
- javascript - HTML5 本地存储 - 我可以为每个键存储多个值吗?如果不能,我可以使用什么替代方法?
- powershell - PowerShell - CSV - 多个标头和值 - Foreach - 使用标头和值
- android - BiometricPromptCompat:禁用虹膜/人脸识别(仅允许指纹)