首页 > 解决方案 > Python:基于日期中的月份的数据透视表

问题描述

我想根据日期中的一个月来旋转我的表格。

预期的示例结果:

    jul,revenue,aug,revenue,sept,revenue,okt,revenue,nov,revenue,dec,revenue,jan,revenue,feb,revenue,mar,revenue,apr,revenue,may,revenue,jun,revenue
Jul-2017,1000,Aug-2017,1000,Sep-2017,1000,Oct-2017,1000,Nov-2017,1000,Dec-2017,1000,Jan-2018,1000,Feb-2018,1000,Mar-2018,1000,Apr-2018,1000,May-2018,1000,Jun-2018,1000
Jul-2018,1000,Aug-2018,1000,Sep-2018,1000,Oct-2018,1000,Nov-2018,1000,Dec-2018,1000,Jan-2019,1000,Feb-2019,1000,Mar-2019,1000,Apr-2019,1000,May-2019,1000,Jun-2019,1000
Jul-2019,1000,Aug-2019,1000,Sep-2019,1000,Oct-2019,1000,Nov-2019,1000,Dec-2019,1000,Jan-2020,1000,Feb-2020,1000,Mar-2020,1000,Apr-2020,0,May-2020,0,Jun-2020,0

我用这段代码试了一下:

import pandas as pd
from datetime import date
df = pd.read_excel (r'examplefile')
ndf = df[['Saleprice', 'Date' , 'Season', 'Area', 'Place', 'Bookingsnumber']].reset_index(drop=True)
ndf['Date'] = pd.to_datetime(ndf['Date'])

#Revenue per Year/Month
ndf['my'] = ndf['Date'].map(lambda x: x.strftime('%m/%Y'))
grouped_ndf = ndf.groupby('my').sum().reset_index()
print(grouped_ndf)

控制台给了我这个结果:

            my  Saleprice
0  01/2019       2000
1  01/2020       3000
2  02/2019       4000
3  02/2020       6000

示例文件:

Saleprice,Date,Season,Area,Place,Bookingsnumber,Buy
1000,1-1-2019,winter 2019/2020,Zillertal,Mayrhofen,C20015243,800
1000,3-2-2019,winter 2019/2020,Zillertal,Königsleiten,B20015245,800
1000,5-2-2019,winter 2019/2020,Zell am See / Kaprun,Zell am See,C20015246,800
1000,1-1-2020,winter 2019/2020,Zell am See / Kaprun,Zell am See,D20015248,800
1000,3-2-2020,winter 2019/2020,Les Trois Vallées,Val Thorens,C20015249,800
1000,5-2-2020,winter 2019/2020,Zillertal,Königsleiten,C20015251,800
1000,1-1-2019,winter 2019/2020,Zillertal,Königsleiten,C20015252,800
1000,3-2-2019,winter 2019/2020,Espace Killy - Tignes,Tignes-Les-Brévières,C20015250,800
1000,5-2-2019,winter 2019/2020,Zillertal,Gerlos,B20015253,800
1000,1-1-2020,winter 2020/2021,Les Trois Vallées,Les Menuires,C20015254,800
1000,3-2-2020,winter 2020/2021,Zell am See / Kaprun,Zell am See,B20015255,800
1000,5-2-2020,winter 2020/2021,Zell am See / Kaprun,Zell am See,D20015256,800
1000,1-1-2020,winter 2020/2021,Les Trois Vallées,Les Menuires,E20015254,800
1000,3-2-2020,winter 2020/2021,Zell am See / Kaprun,Zell am See,B20015255,800
1000,5-2-2020,winter 2020/2021,Zell am See / Kaprun,Zell am See,B20015256,800

我必须改变什么才能得到我的示例结果?期待你的回复。

标签: pythonpandasdatetimepivotpandas-groupby

解决方案


这段代码完成了这项工作,它不是最 Pythonic 的代码,但这是我能想到的全部:

df['Date'] = df['Date'].apply(lambda x: pd.to_datetime(x))
months = df['Date'].apply(lambda x: x.strftime('%b')).unique()
df['my'] = df['Date'].map(lambda x: x.strftime('%b/%Y'))
dfgrouped = df.groupby('my')['Saleprice'].sum().reset_index()
month_dfs = [*map(lambda x: dfgrouped[dfgrouped['my'].str[:3]==x].reset_index(drop=True).rename(columns={'my': x, 'Saleprice': 'revenue'}), months)]
new_df = pd.concat(month_dfs, axis=1)

请注意 pd.to_datetime 将 Date 列中的第一个数字视为月份,如果您希望第二个数字是月份,则必须执行以下操作:

from datetime import datetime
df['Date'] = df['Date'].apply(lambda x: datetime.strptime(x, '%d-%m-%Y'))
months = df['Date'].apply(lambda x: x.strftime('%b')).unique()
df['my'] = df['Date'].map(lambda x: x.strftime('%b/%Y'))
dfgrouped = df.groupby('my')['Saleprice'].sum().reset_index()
month_dfs = [*map(lambda x: dfgrouped[dfgrouped['my'].str[:3]==x].reset_index(drop=True).rename(columns={'my': x, 'Saleprice': 'revenue'}), months)]
new_df = pd.concat(month_dfs, axis=1)

如果要对月份进行排序:

from datetime import datetime
df['Date'] = df['Date'].apply(lambda x: datetime.strptime(x, '%m-%d-%Y'))
months_in_df = months = df['Date'].apply(lambda x: x.strftime('%b')).unique()
months = []
for i in range(1,13):
    if datetime.date(2020, i, 1).strftime('%b') in months_in_df:
        months.append((i, datetime.date(2020, i, 1).strftime('%b')))

df['my'] = df['Date'].map(lambda x: x.strftime('%b/%Y'))
dfgrouped = df.groupby('my')['Saleprice'].sum().reset_index()
month_dfs = [*map(lambda x: dfgrouped[dfgrouped['my'].str[:3]==x].reset_index(drop=True).rename(columns={'my': x, 'Saleprice': 'revenue'}), months)]
new_df = pd.concat(month_dfs, axis=1)


推荐阅读