首页 > 解决方案 > 嵌套for循环以基于pandas中的2级多索引创建多个数据透视表

问题描述

开始对这个感到困惑。我有一个大的事实发票抬头表。我采用了原始数据框,使用 groupby 根据一列将 df 拆分。输出是一个数据框列表:

list_of_dfs = []
for _, g in df.groupby(df['Project State Name']):
    list_of_dfs.append(g)
list_of_dfs

然后我使用另一个 for 循环遍历数据框列表并执行一个数据透视表聚合。

for each_state_df in list_of_dfs:
    columns_to_index_by = ['Project Issue', 'Project Secondary Issue', 'Project Client Name']
# Aggregating to the Project Level
    table_for_pivots = pd.pivot_table(df, index=['FY Year', 'Project Issue'], values=["Project Key", 'Total Net Amount', "Project Total Resolution Amount", 'Project Budgeted Amount'],
                             aggfunc= {"Project Key": lambda x: len(x.unique()), 'Total Net Amount': np.sum,  "Project Total Resolution Amount": np.mean,
                                       'Project Budgeted Amount': np.mean},
                             fill_value=np.mean)
print(table_for_pivots)

我的问题是,如何使用另一个 for 循环将数据透视表索引中的第二个元素替换为变量 columns_to_index_by 中的每个值?输出将是 3 个数据透视表,其中 index=['FY Year', 'Project Issue'], index=['FY Year', 'Project Secondary Issue', and index=['FY Year', 'Project Client Name' ]。谢谢大家!

下载示例 df 数据的链接在这里:

https://ufile.io/iufv9nma

标签: pythonpandasdataframefor-looppivot-table

解决方案


使用列表理解并遍历zip要为每个组设置的索引:

from pandas import Timestamp
from numpy import nan
d = {'Total Net Amount': {2: 672.0, 41: 1277.9, 17: 270.0, 32: 845.3, 26: 828.62, 11: 733.5, 23: 1741.8, 35: 254.14655, 29: 245.0, 59: 215.0, 38: 617.4, 0: 1061.5}, 'Project Total Resolution Amount': {2: 35000, 41: 27000, 17: 40000, 32: 27000, 26: 27000, 11: 40000, 23: 27000, 35: 27000, 29: 27000, 59: 27000, 38: 27000, 0: 30000}, 'Invoice Header Key': {2: 1229422, 41: 984803, 17: 1270731, 32: 938069, 26: 911535, 11: 1247443, 23: 902150, 35: 943737, 29: 918888, 59: 1071541, 38: 965091, 0: 1279581}, 'Project Key': {2: 259661, 41: 194517, 17: 259188, 32: 194517, 26: 194517, 11: 259188, 23: 194517, 35: 194517, 29: 194517, 59: 194517, 38: 194517, 0: 263736}, 'Project Secondary Issue': {2: 2, 41: 4, 17: 0, 32: 3, 26: 3, 11: 0, 23: 4, 35: 4, 29: 4, 59: 4, 38: 3, 0: 4}, 'Organization Key': {2: 16029, 41: 22638, 17: 24230, 32: 22638, 26: 22638, 11: 24230, 23: 22638, 35: 22638, 29: 22638, 59: 22638, 38: 22638, 0: 4532}, 'Project Budgeted Amount': {2: 42735.0, 41: 32500.0, 17: 26000.0, 32: 32500.0, 26: 32500.0, 11: 26000.0, 23: 32500.0, 35: 32500.0, 29: 32500.0, 59: 32500.0, 38: 32500.0, 0: nan}, 'Project State Name': {2: 0, 41: 1, 17: 2, 32: 1, 26: 1, 11: 2, 23: 1, 35: 1, 29: 1, 59: 1, 38: 1, 0: 1}, 'Project Issue': {2: 0, 41: 2, 17: 1, 32: 2, 26: 2, 11: 1, 23: 2, 35: 2, 29: 2, 59: 2, 38: 2, 0: 1}, 'Project Number': {2: 2, 41: 0, 17: 1, 32: 0, 26: 0, 11: 1, 23: 0, 35: 0, 29: 0, 59: 0, 38: 0, 0: 3}, 'Project Client Name': {2: 1, 41: 0, 17: 0, 32: 0, 26: 0, 11: 0, 23: 0, 35: 0, 29: 0, 59: 0, 38: 0, 0: 1}, 'Paid Date Year Month': {2: 13, 41: 7, 17: 15, 32: 4, 26: 2, 11: 14, 23: 1, 35: 5, 29: 3, 59: 12, 38: 6, 0: 16}, 'FY Year': {2: 2, 41: 0, 17: 2, 32: 0, 26: 0, 11: 2, 23: 0, 35: 0, 29: 0, 59: 1, 38: 0, 0: 2}, 'Invoice Paid Date': {2: Timestamp('2019-09-10 00:00:00'), 41: Timestamp('2017-12-20 00:00:00'), 17: Timestamp('2019-11-25 00:00:00'), 32: Timestamp('2017-08-31 00:00:00'), 26: Timestamp('2017-06-14 00:00:00'), 11: Timestamp('2019-10-08 00:00:00'), 23: Timestamp('2017-05-30 00:00:00'), 35: Timestamp('2017-09-07 00:00:00'), 29: Timestamp('2017-07-10 00:00:00'), 59: Timestamp('2018-10-03 00:00:00'), 38: Timestamp('2017-11-03 00:00:00'), 0: Timestamp('2019-12-12 00:00:00')}, 'Invoice Paid Date Key': {2: 20190910, 41: 20171220, 17: 20191125, 32: 20170831, 26: 20170614, 11: 20191008, 23: 20170530, 35: 20170907, 29: 20170710, 59: 20181003, 38: 20171103, 0: 20191212}, 'Count Project Secondary Issue': {2: 3, 41: 3, 17: 3, 32: 3, 26: 3, 11: 3, 23: 3, 35: 3, 29: 3, 59: 3, 38: 3, 0: 2}, 'Total Net Amount By Count Project Secondary Issue': {2: 224.0, 41: 425.9666666666667, 17: 90.0, 32: 281.7666666666667, 26: 276.2066666666666, 11: 244.5, 23: 580.6, 35: 84.71551666666666, 29: 81.66666666666667, 59: 71.66666666666667, 38: 205.8, 0: 530.75}, 'Total Net Invoice Amount': {2: 672.0, 41: 1277.9, 17: 270.0, 32: 845.3, 26: 828.62, 11: 733.5, 23: 1741.8, 35: 254.14655, 29: 245.0, 59: 215.0, 38: 617.4, 0: 1061.5}, 'Total Project Invoice Amount': {2: 7176.52, 41: 10110.98655, 17: 1678.5, 32: 10110.98655, 26: 10110.98655, 11: 1678.5, 23: 10110.98655, 35: 10110.98655, 29: 10110.98655, 59: 10110.98655, 38: 10110.98655, 0: 1061.5}, 'Invoice Dollar Percent of Project': {2: 0.09363869953682286, 41: 0.1263872712796755, 17: 0.160857908847185, 32: 0.08360212881501655, 26: 0.08195243816242638, 11: 0.4369973190348526, 23: 0.1722680562758735, 35: 0.02513568272919916, 29: 0.02423106773888449, 59: 0.02126399821983741, 38: 0.06106229070198891, 0: 1.0}}
df = pd.DataFrame(d)

# list comprehension with groupby
group = [g for _, g in df.groupby('Project State Name')]

#create a list of indices you want to use in pivot
idx = [['FY Year', 'Project Issue'],
       ['FY Year', 'Project Secondary Issue'],
       ['FY Year', 'Project Client Name']]

# create a list of columns to add to the value param in pivot
values = ["Project Key", 'Total Net Amount',
          "Project Total Resolution Amount", 'Project Budgeted Amount']

# use your current pivot and iterate through zip(idx, group)
dfs = [pd.pivot_table(df, index=i, values=values,
                      aggfunc= {"Project Key": lambda x: len(x.unique()), 'Total Net Amount': np.sum,
                                "Project Total Resolution Amount": np.mean,
                                'Project Budgeted Amount': np.mean},
                                 fill_value=np.mean) for i,df in zip(idx, group)]

听写理解

我不知道您希望密钥是什么,所以我只是从 idx 中选择了第二个值。您将从字典中调用每个数据帧dfs['Project Issue']

dfs = {i[1]: pd.pivot_table(df, index=i, values=values,
                      aggfunc= {"Project Key": lambda x: len(x.unique()), 'Total Net Amount': np.sum,
                                "Project Total Resolution Amount": np.mean,
                                'Project Budgeted Amount': np.mean},
                                 fill_value=np.mean) for i,df in zip(idx, group)}

推荐阅读