首页 > 解决方案 > 在 Pandas 数据透视表的列中格式化时间格式

问题描述

我正在使用来自 json 数据的 Pandas 构建一个数据透视表。我想在发送到 to_string() 之前格式化列名。

import numpy as np
import pandas as pd

json_data = [
 {"year":2019,"month":"2019-11-01","sub_category":"Van fit out","category":"Vehicle","notes":"Heavy duty hooks","gross_amount":8.96}, 
 {"year":2019,"month":"2019-11-01","sub_category":"Fuel & oil","category":"Vehicle","notes":"Fuel","gross_amount":20.00},  
# more data  [...]
 {"year":2020,"month":"2020-02-01","sub_category":"Gutter Vac","category":"WC Equipment + H&S","notes":"Tape Measure + Bungi Cord + Plastic Membrane + Extension reel + Microfibre cloths + Waterproof Jacket","gross_amount":97.94}, 
 {"year":2020,"month":"2020-02-01","sub_category":"Trad equipment","category":"WC Materials","notes":"Spray Bottle + Microfibres","gross_amount":4.47}, 
 ]            

data = pd.DataFrame(json_data)

# Pivot the data:
pivot = pd.pivot_table(
            data, values=['gross_amount'], index=['category', 'sub_category'],
                    columns=['year', 'month'], aggfunc=np.sum, fill_value=0, dropna=True, margins=True)
# Add total rows for index level 0:
pivot = pd.concat([
        d.append(d.sum(skipna=True).rename((k, 'Total')))
        for k, d in pivot.groupby(level=0)
        ])

# Render to string:
string = pivot.to_string()

print(string)

结果是

                                  gross_amount
year                                      2019       2020     All
month                               2019-11-01 2020-02-01
category           sub_category
All                                      28.96     102.41  131.37
                   Total                 28.96     102.41  131.37
Vehicle            Fuel & oil            20.00       0.00   20.00
                   Van fit out            8.96       0.00    8.96
                   Total                 28.96       0.00   28.96
WC Equipment + H&S Gutter Vac             0.00      97.94   97.94
                   Total                  0.00      97.94   97.94
WC Materials       Trad equipment         0.00       4.47    4.47
                   Total                  0.00       4.47    4.47

我怎样才能让月份的格式不同(在我的情况下,我需要月份名称)?在旋转之前,我已将月份更改为数据框中的字符串,但随后我丢失了正确的顺序。

谢谢

标签: pythonpandasdataframepivot-table

解决方案


使用 pivot_table() 似乎不可能,但我使用 group_by() 来管理它,它接受一个sort 参数

pivot = data.groupby(['category', 'sub_category', 'year', 'monthname'], sort=False)['gross_amount'].sum().unstack(['year', 'monthname'])

sort=False阻止它按字母顺序排序,并保留 monts 出现的原始顺序,因此必须在分组之前对数据框进行排序。


推荐阅读