首页 > 解决方案 > 具有行和列总计的考拉数据框 pivot_table

问题描述

我之前在 Pandas 数据框上使用了以下代码,成功生成了一个包含列和行总计的数据透视表,就像在 Excel 中一样:

df_check=df.pivot_table(index=['month_yr'], columns=['product'], margins=True, aggfunc=np.sum, values='weight', margins_name='Total')

我现在正在使用与 Databricks 中的考拉数据框相同的数据。但是,此代码不会运行;我不能使用'margins=True',或者'margins_name='Total'因为这些会产生错误。

我修改了上面的代码以运行但没有列/行总计:

kdf_check=kdf.pivot_table(index=['month_yr'], columns='product', aggfunc='sum',values='weight')

您如何修改此代码以添加行和列总计?

从 kdf.head().to_dict():

{'city_nm': {0: 'HAMILTON',
  1: 'MADAWASKA',
  2: 'PALMER RAPIDS',
  3: 'OXFORD MILLS',
  4: 'MADAWASKA'},
 'prov': {0: 'ON', 1: 'ON', 2: 'ON', 3: 'ON', 4: 'ON'},
 'dest_country': {0: 'United States',
  1: 'United States',
  2: 'United States',
  3: 'United States',
  4: 'United States'},
 'commodity_descrp': {0: 'BIOLOGICAL SUBSTANCE',
  1: 'VME CARD CAGE/HEAD. SENSOR SENT',
  2: 'DENTAL IMPRESSION TRAY',
  3: 'LAPTOP',
  4: 'VME CARD CAGE/HEAD. SENSOR SENT'},
 'harmonized_code': {0: '3002.90.5010', 1: '', 2: '', 3: '', 4: ''},
 'acct_nbr': {0: '', 1: '', 2: '', 3: '', 4: ''},
 'bus_nm': {0: None, 1: None, 2: None, 3: None, 4: None},
 'naic_4_cd': {0: '', 1: '', 2: '', 3: '', 4: ''},
 'product': {0: 'FAST', 1: 'FAST', 2: 'SLOW', 3: 'FAST', 4: 'FAST'},
 'weight': {0: 11.0, 1: 85.0, 2: 0.7, 3: 7.0, 4: 85.0},
 'dest_indicator': {0: 'US', 1: 'US', 2: 'US', 3: 'US', 4: 'US'},
 'mth_yr': {0: 'July2020',
  1: 'May2020',
  2: 'June2020',
  3: 'December2020',
  4: 'May2020'},
 'hc_dig': {0: '30', 1: '', 2: '', 3: '', 4: ''},
 'hc_dig_clean': {0: '30', 1: '', 2: '30', 3: '', 4: ''}}

标签: pandaspivot

解决方案


由于koalas.pivot_table()不支持margins,我将手动添加总数:

kdf_check = kdf_check.append(kdf_check.sum().rename('Total'))
kdf_check['Total'] = kdf_check.sum(1)

# product        FAST  SLOW  Total
# mth_yr                          
# December2020    7.0   NaN    7.0
# July2020       11.0   NaN   11.0
# June2020        NaN   0.7    0.7
# May2020       170.0   NaN  170.0
# Total         188.0   0.7  188.7

推荐阅读