首页 > 解决方案 > 在多个索引大小不相等的数据帧上计算同一个月在时间 n-1 的平均值

问题描述

我现在处理一个问题的时间太长了,但我找不到解决方案。我有一个数据集,我想使用过去几年的同一月份的平均值按月计算项目的月度预测。

data_2017 = pd.DataFrame({'Id' : ['001', '001','002', '003', '003'], 'Date' : ['2017-01-01','2017-02-01', '2017-02-01','2017-01-01', '2017-02-01'], 'Quantity': [2,2,3,4,4]})

data_2018 = pd.DataFrame({'Id' : ['001', '001','002','002' ,'003', '003'], 'Date' : ['2018-01-01','2018-02-01', '2018-01-01','2018-02-01','2018-01-01', '2018-02-01'], 'Quantity': [3,3,5,5,3,5]})

我的代码如下所示:

data_2017['Date'] =pd.to_datetime(data_2017['Date'])
datal = data_2017.groupby(['Id',pd.Grouper(key='Date', freq='M')])['Quantity'].sum().reset_index().sort_values('Date')

data_2018['Date'] =pd.to_datetime(data_2018['Date'])
datam = data_2018.groupby(['Id',pd.Grouper(key='Date', freq='M')])['Quantity'].sum().reset_index().sort_values('Date')

demand_january_2017= data_2017[data_2017['Date'].dt.month == 1]
demand_february_2017= data_2017[data_2017['Date'].dt.month == 2]

demand_january_2018= data_2018[data_2018['Date'].dt.month == 1]
demand_february_2018= data_2018[data_2018['Date'].dt.month == 2]

pred_demand_2019_january = 0.3*demand_january_2017['Quantity'] + 0.7*demand_january_2018['Quantity']
pred_demand_2019_february = 0.3*demand_february_2017['Quantity'] + 0.7*demand_february_2018['Quantity']

代码运行但输出关闭,因为我不知道如何处理数据帧中的索引不同的事实。

0    2.7
2    NaN
3    NaN
4    NaN
Name: Quantity, dtype: float64
1    2.7
2    NaN
3    NaN
4    NaN
5    NaN
Name: Quantity, dtype: float64

在这一点上,任何帮助都将受到欢迎!

标签: pandas

解决方案


对于两个过滤DataFrame的 s 都需要在同一年进行正确对齐,因此将一年添加到,data_2017['Date']然后删除reset_indexMulitIndex Series

data_2017['Date'] =pd.to_datetime(data_2017['Date']) + pd.offsets.DateOffset(years=1)
datal = data_2017.groupby(['Id',pd.Grouper(key='Date', freq='M')])['Quantity'].sum()

data_2018['Date'] =pd.to_datetime(data_2018['Date'])
datam = data_2018.groupby(['Id',pd.Grouper(key='Date', freq='M')])['Quantity'].sum() 

按级别过滤Date

demand_january_2017= datal[datal.index.get_level_values('Date').month == 1]
demand_february_2017= datal[datal.index.get_level_values('Date').month == 2]

demand_january_2018= datam[datam.index.get_level_values('Date').month == 1]
demand_february_2018= datam[datam.index.get_level_values('Date').month == 2]

计数输出MultiIndex Series并最后转换为DataFrame

pred_demand_2019_january = (0.3*demand_january_2017 + 0.7*demand_january_2018).reset_index().sort_values('Date')
pred_demand_2019_february = (0.3*demand_february_2017 + 0.7*demand_february_2018).reset_index().sort_values('Date')
print (pred_demand_2019_january)
    Id       Date  Quantity
0  001 2018-01-31       2.7
1  002 2018-01-31       NaN
2  003 2018-01-31       3.3

print (pred_demand_2019_february)
    Id       Date  Quantity
0  001 2018-02-28       2.7
1  002 2018-02-28       4.4
2  003 2018-02-28       4.7

如果想要所有月份在一起:

data_2017['Date'] =pd.to_datetime(data_2017['Date']) + pd.offsets.DateOffset(years=1)
datal = data_2017.groupby(['Id',pd.Grouper(key='Date', freq='M')])['Quantity'].sum()

data_2018['Date'] =pd.to_datetime(data_2018['Date'])
datam = data_2018.groupby(['Id',pd.Grouper(key='Date', freq='M')])['Quantity'].sum() 

pred_demand_2019 = (0.3*datal + 0.7*datam).reset_index().sort_values('Date')
print (pred_demand_2019)
    Id       Date  Quantity
0  001 2018-01-31       2.7
2  002 2018-01-31       NaN
4  003 2018-01-31       3.3
1  001 2018-02-28       2.7
3  002 2018-02-28       4.4
5  003 2018-02-28       4.7

另一个想法是只使用几个月:

data_2017['m'] =pd.to_datetime(data_2017['Date']).dt.month
datal = data_2017.groupby(['Id','m'])['Quantity'].sum()

data_2018['m'] =pd.to_datetime(data_2018['Date']).dt.month
datam = data_2018.groupby(['Id','m'])['Quantity'].sum() 

demand_january_2017= datal[datal.index.get_level_values('m') == 1]
demand_february_2017= datal[datal.index.get_level_values('m') == 2]

demand_january_2018= datam[datam.index.get_level_values('m') == 1]
demand_february_2018= datam[datam.index.get_level_values('m') == 2]


pred_demand_2019_january = (0.3*demand_january_2017 + 0.7*demand_january_2018).reset_index()
pred_demand_2019_february = (0.3*demand_february_2017 + 0.7*demand_february_2018).reset_index()
print (pred_demand_2019_january)
    Id  m  Quantity
0  001  1       2.7
1  002  1       NaN
2  003  1       3.3

print (pred_demand_2019_february)
    Id  m  Quantity
0  001  2       2.7
1  002  2       4.4
2  003  2       4.7

或者在一起的所有月份:

data_2017['m'] =pd.to_datetime(data_2017['Date']).dt.month
datal = data_2017.groupby(['Id','m'])['Quantity'].sum()

data_2018['m'] =pd.to_datetime(data_2018['Date']).dt.month
datam = data_2018.groupby(['Id','m'])['Quantity'].sum() 


pred_demand_2019 = (0.3*datal + 0.7*datam).reset_index()
print (pred_demand_2019)
    Id  m  Quantity
0  001  1       2.7
1  001  2       2.7
2  002  1       NaN
3  002  2       4.4
4  003  1       3.3
5  003  2       4.7

推荐阅读