首页 > 解决方案 > 遍历 XLSX 工作簿并将唯一值保存在从数据框到单个工作表的卷中

问题描述

这有点令人费解。但是我在 XLSX 中有一个输入表,我在其中使用 pandas groupby 进行转换以求和供应商的成本 [唯一值],然后对这些成本的总和进行排名和分类。然后我想遍历表,获取列中的唯一值、与该唯一值相关的值并将其保存到自己的工作表中,每个工作簿都在一个工作簿中。

我不断收到错误消息:

IndexError: string index out of range. 

我不知道为什么当我尝试对汇总的 groupby 进行排名和分箱时,在尝试迭代时会出现索引错误。我的 github 上有工作簿和脚本。每个唯一供应商的最终结果将类似于下图。谢谢你们。我很感激。这里的 Github 链接:https ://github.com/beingandbrian/xlsx_supplier_scorecard_tables.git 在此处输入图像描述

标签: pythonexcelpandas

解决方案


我试图在我的 Jupyter 笔记本中重现您的代码:

# imports
import os
from pathlib import Path
import pandas as pd
from openpyxl import load_workbook
from openpyxl.utils.dataframe import dataframe_to_rows
from more_itertools import unique_everseen
# from udfs import *

def agg_rank_bin_rename_join(df_i_want_to_agg, attributes, value):
if not isinstance(attributes, list):
    attributes = [attributes]

value_dict = { key: ['sum'] for key in value } if isinstance(value, list) else { value: ['sum'] }

df_simp_arith = df_i_want_to_agg.groupby(attributes).agg(value_dict)
df_simp_arith.columns = ['Sum']

df_rank = df_simp_arith.rank(ascending=1, method='dense').add_prefix('Rank ') 
bin_labels = ['Least Expensive', 'Less Expensive', 'Average', 'More Expensive', 'Most Expensive']

df_bin = df_rank.apply(lambda x: pd.qcut(x, q=[0, .2, .4, .6, .8, 1], labels=bin_labels))
output = df_simp_arith.join(df_rank).join(df_bin.add_prefix('Bin '))
output = output.copy(deep=True)
return output

def add_df_to_ws(the_ws_title_name, the_df):
ws_loaded_object = wb_loaded_object.create_sheet(title = the_ws_title_name)

for r in dataframe_to_rows(the_df, index = False, header = True):
    ws_loaded_object.append(r)

for cell in ws_loaded_object['A'] + ws_loaded_object[1]:
    cell.style = 'Pandas'

我正在使用没有任何操作系统导入的简单 readexcel

# read in xlsx filepath data in as a df
df_loaded = pd.read_excel(r'C:/Users/ttayyab/Downloads/input.xlsx')
# transform the df using groupby
df_groupby = agg_rank_bin_rename_join(df_loaded, ['Supplier Name', 'Paid Date FY Year'], 'Total Net Amount').reset_index()

# create a list of unique values in index attribute column
unique_values_in_attribute_column_list = list(df_groupby['Supplier Name'].unique())

#output

 ['Alexandra Hodge',
 'Brian Lee',
 'Carrie Caldwell',
 'Cynthia Jones DVM',
 'Danielle Leblanc',
 'Destiny Swanson',
 'Drew Le',
 'Jason Castaneda',
 'Jonathan Tran',
 'Joyce Green',
 'Karen Tran',
 'Melissa Wang',
 'Michael Nguyen',
 'Mrs. Teresa Thompson DDS',
 'Patrick Herrera',
 'Rachel Odom',
 'Vincent Coleman']

data我创建了一个名为附加数据框值的空列表:

data = []
for i, each_unique_value in enumerate(unique_values_in_attribute_column_list):
    data.append(df_groupby.loc[df_groupby['Supplier Name'] == each_unique_value])

它在 Jupyter 单元上显示如下:

[     Supplier Name Paid Date FY Year    Sum  Rank Sum     Bin Rank Sum
 0  Alexandra Hodge            FY2018  590.5       2.0  Least Expensive
 1  Alexandra Hodge            FY2019  745.0       4.0  Least Expensive
 2  Alexandra Hodge            FY2020  165.0       1.0  Least Expensive,
   Supplier Name Paid Date FY Year           Sum  Rank Sum    Bin Rank Sum
 3     Brian Lee            FY2018  301250.03965      34.0  More Expensive
 4     Brian Lee            FY2019  711728.51000      40.0  More Expensive
 5     Brian Lee            FY2020  584431.09000      38.0  More Expensive,
      Supplier Name Paid Date FY Year         Sum  Rank Sum    Bin Rank Sum
 6  Carrie Caldwell            FY2018  1324015.88      45.0  Most Expensive
 7  Carrie Caldwell            FY2019  3613888.51      50.0  Most Expensive
 8  Carrie Caldwell            FY2020  3013043.44      49.0  Most Expensive,
         Supplier Name Paid Date FY Year        Sum  Rank Sum    Bin Rank Sum
 9   Cynthia Jones DVM            FY2018   56917.16      19.0  Less Expensive
 10  Cynthia Jones DVM            FY2019  179481.53      31.0         Average
 11  Cynthia Jones DVM            FY2020   99613.13      22.0         Average,
        Supplier Name Paid Date FY Year       Sum  Rank Sum     Bin Rank Sum
 12  Danielle Leblanc            FY2018   1768.76       6.0  Least Expensive
 13  Danielle Leblanc            FY2019  27246.60      15.0   Less Expensive
 14  Danielle Leblanc            FY2020   2871.86       7.0  Least Expensive,
       Supplier Name Paid Date FY Year        Sum  Rank Sum    Bin Rank Sum
 15  Destiny Swanson            FY2018  145049.74      27.0         Average
 16  Destiny Swanson            FY2019  145257.46      28.0         Average
 17  Destiny Swanson            FY2020   28073.53      16.0  Less Expensive,
    Supplier Name Paid Date FY Year        Sum  Rank Sum    Bin Rank Sum
 18       Drew Le            FY2018   26412.38      14.0  Less Expensive
 19       Drew Le            FY2019  125407.70      24.0         Average
 20       Drew Le            FY2020  132418.75      25.0         Average,
       Supplier Name Paid Date FY Year        Sum  Rank Sum    Bin Rank Sum
 21  Jason Castaneda            FY2018  115625.69      23.0         Average
 22  Jason Castaneda            FY2019  433708.13      35.0  More Expensive
 23  Jason Castaneda            FY2020  145646.68      29.0         Average,
     Supplier Name Paid Date FY Year      Sum  Rank Sum     Bin Rank Sum
 24  Jonathan Tran            FY2018  15000.0      10.0  Least Expensive
 25  Jonathan Tran            FY2019  16300.0      11.0  Least Expensive
 26  Jonathan Tran            FY2020  18900.0      12.0   Less Expensive,
    Supplier Name Paid Date FY Year      Sum  Rank Sum     Bin Rank Sum
 27   Joyce Green            FY2018  1526.16       5.0  Least Expensive
 28   Joyce Green            FY2019  9999.09       9.0  Least Expensive
 29   Joyce Green            FY2020   708.10       3.0  Least Expensive,
    Supplier Name Paid Date FY Year       Sum  Rank Sum     Bin Rank Sum
 30    Karen Tran            FY2018  23127.99      13.0   Less Expensive
 31    Karen Tran            FY2019  28490.21      17.0   Less Expensive
 32    Karen Tran            FY2020   7067.30       8.0  Least Expensive,
    Supplier Name Paid Date FY Year          Sum  Rank Sum    Bin Rank Sum
 33  Melissa Wang            FY2018  207494.8321      32.0  More Expensive
 34  Melissa Wang            FY2019  228311.6400      33.0  More Expensive
 35  Melissa Wang            FY2020   57821.3100      20.0  Less Expensive,
      Supplier Name Paid Date FY Year        Sum  Rank Sum    Bin Rank Sum
 36  Michael Nguyen            FY2018  140296.74      26.0         Average
 37  Michael Nguyen            FY2019  922189.38      42.0  Most Expensive
 38  Michael Nguyen            FY2020  552030.99      37.0  More Expensive,
                Supplier Name Paid Date FY Year        Sum  Rank Sum  \
 39  Mrs. Teresa Thompson DDS            FY2018   90791.41      21.0   
 40  Mrs. Teresa Thompson DDS            FY2019  157290.15      30.0   
 41  Mrs. Teresa Thompson DDS            FY2020   40868.99      18.0   
 
       Bin Rank Sum  
 39  Less Expensive  
 40         Average  
 41  Less Expensive  ,
       Supplier Name Paid Date FY Year         Sum  Rank Sum    Bin Rank Sum
 42  Patrick Herrera            FY2018   661283.77      39.0  More Expensive
 43  Patrick Herrera            FY2019  2236597.92      47.0  Most Expensive
 44  Patrick Herrera            FY2020  1190359.61      44.0  Most Expensive,
    Supplier Name Paid Date FY Year        Sum  Rank Sum    Bin Rank Sum
 45   Rachel Odom            FY2018  824392.65      41.0  More Expensive
 46   Rachel Odom            FY2019  984224.09      43.0  Most Expensive
 47   Rachel Odom            FY2020  520067.29      36.0  More Expensive,
       Supplier Name Paid Date FY Year         Sum  Rank Sum    Bin Rank Sum
 48  Vincent Coleman            FY2018  1464104.11      46.0  Most Expensive
 49  Vincent Coleman            FY2019  3737305.47      51.0  Most Expensive
 50  Vincent Coleman            FY2020  2775609.73      48.0  Most Expensive]

推荐阅读