pandas - 在多个索引大小不相等的数据帧上计算同一个月在时间 n-1 的平均值
问题描述
我现在处理一个问题的时间太长了,但我找不到解决方案。我有一个数据集,我想使用过去几年的同一月份的平均值按月计算项目的月度预测。
data_2017 = pd.DataFrame({'Id' : ['001', '001','002', '003', '003'], 'Date' : ['2017-01-01','2017-02-01', '2017-02-01','2017-01-01', '2017-02-01'], 'Quantity': [2,2,3,4,4]})
data_2018 = pd.DataFrame({'Id' : ['001', '001','002','002' ,'003', '003'], 'Date' : ['2018-01-01','2018-02-01', '2018-01-01','2018-02-01','2018-01-01', '2018-02-01'], 'Quantity': [3,3,5,5,3,5]})
我的代码如下所示:
data_2017['Date'] =pd.to_datetime(data_2017['Date'])
datal = data_2017.groupby(['Id',pd.Grouper(key='Date', freq='M')])['Quantity'].sum().reset_index().sort_values('Date')
data_2018['Date'] =pd.to_datetime(data_2018['Date'])
datam = data_2018.groupby(['Id',pd.Grouper(key='Date', freq='M')])['Quantity'].sum().reset_index().sort_values('Date')
demand_january_2017= data_2017[data_2017['Date'].dt.month == 1]
demand_february_2017= data_2017[data_2017['Date'].dt.month == 2]
demand_january_2018= data_2018[data_2018['Date'].dt.month == 1]
demand_february_2018= data_2018[data_2018['Date'].dt.month == 2]
pred_demand_2019_january = 0.3*demand_january_2017['Quantity'] + 0.7*demand_january_2018['Quantity']
pred_demand_2019_february = 0.3*demand_february_2017['Quantity'] + 0.7*demand_february_2018['Quantity']
代码运行但输出关闭,因为我不知道如何处理数据帧中的索引不同的事实。
0 2.7
2 NaN
3 NaN
4 NaN
Name: Quantity, dtype: float64
1 2.7
2 NaN
3 NaN
4 NaN
5 NaN
Name: Quantity, dtype: float64
在这一点上,任何帮助都将受到欢迎!
解决方案
对于两个过滤DataFrame
的 s 都需要在同一年进行正确对齐,因此将一年添加到,data_2017['Date']
然后删除reset_index
:MulitIndex Series
data_2017['Date'] =pd.to_datetime(data_2017['Date']) + pd.offsets.DateOffset(years=1)
datal = data_2017.groupby(['Id',pd.Grouper(key='Date', freq='M')])['Quantity'].sum()
data_2018['Date'] =pd.to_datetime(data_2018['Date'])
datam = data_2018.groupby(['Id',pd.Grouper(key='Date', freq='M')])['Quantity'].sum()
按级别过滤Date
:
demand_january_2017= datal[datal.index.get_level_values('Date').month == 1]
demand_february_2017= datal[datal.index.get_level_values('Date').month == 2]
demand_january_2018= datam[datam.index.get_level_values('Date').month == 1]
demand_february_2018= datam[datam.index.get_level_values('Date').month == 2]
计数输出MultiIndex Series
并最后转换为DataFrame
:
pred_demand_2019_january = (0.3*demand_january_2017 + 0.7*demand_january_2018).reset_index().sort_values('Date')
pred_demand_2019_february = (0.3*demand_february_2017 + 0.7*demand_february_2018).reset_index().sort_values('Date')
print (pred_demand_2019_january)
Id Date Quantity
0 001 2018-01-31 2.7
1 002 2018-01-31 NaN
2 003 2018-01-31 3.3
print (pred_demand_2019_february)
Id Date Quantity
0 001 2018-02-28 2.7
1 002 2018-02-28 4.4
2 003 2018-02-28 4.7
如果想要所有月份在一起:
data_2017['Date'] =pd.to_datetime(data_2017['Date']) + pd.offsets.DateOffset(years=1)
datal = data_2017.groupby(['Id',pd.Grouper(key='Date', freq='M')])['Quantity'].sum()
data_2018['Date'] =pd.to_datetime(data_2018['Date'])
datam = data_2018.groupby(['Id',pd.Grouper(key='Date', freq='M')])['Quantity'].sum()
pred_demand_2019 = (0.3*datal + 0.7*datam).reset_index().sort_values('Date')
print (pred_demand_2019)
Id Date Quantity
0 001 2018-01-31 2.7
2 002 2018-01-31 NaN
4 003 2018-01-31 3.3
1 001 2018-02-28 2.7
3 002 2018-02-28 4.4
5 003 2018-02-28 4.7
另一个想法是只使用几个月:
data_2017['m'] =pd.to_datetime(data_2017['Date']).dt.month
datal = data_2017.groupby(['Id','m'])['Quantity'].sum()
data_2018['m'] =pd.to_datetime(data_2018['Date']).dt.month
datam = data_2018.groupby(['Id','m'])['Quantity'].sum()
demand_january_2017= datal[datal.index.get_level_values('m') == 1]
demand_february_2017= datal[datal.index.get_level_values('m') == 2]
demand_january_2018= datam[datam.index.get_level_values('m') == 1]
demand_february_2018= datam[datam.index.get_level_values('m') == 2]
pred_demand_2019_january = (0.3*demand_january_2017 + 0.7*demand_january_2018).reset_index()
pred_demand_2019_february = (0.3*demand_february_2017 + 0.7*demand_february_2018).reset_index()
print (pred_demand_2019_january)
Id m Quantity
0 001 1 2.7
1 002 1 NaN
2 003 1 3.3
print (pred_demand_2019_february)
Id m Quantity
0 001 2 2.7
1 002 2 4.4
2 003 2 4.7
或者在一起的所有月份:
data_2017['m'] =pd.to_datetime(data_2017['Date']).dt.month
datal = data_2017.groupby(['Id','m'])['Quantity'].sum()
data_2018['m'] =pd.to_datetime(data_2018['Date']).dt.month
datam = data_2018.groupby(['Id','m'])['Quantity'].sum()
pred_demand_2019 = (0.3*datal + 0.7*datam).reset_index()
print (pred_demand_2019)
Id m Quantity
0 001 1 2.7
1 001 2 2.7
2 002 1 NaN
3 002 2 4.4
4 003 1 3.3
5 003 2 4.7
推荐阅读
- reactjs - Admob 构建后不会在真实设备上显示广告。Expo React-native
- java - Spring Security 返回 401,在将日期作为输入发送到其余控制器时
- html - 我使用了 bootstrap-Vue,这个问题与在 for 循环中基于 @click 事件呈现数据有关
- java - 当 TelephonyManager.CALL_STATE_OFFHOOK 是 PhoneStateListener 的 onCallStateChanged 方法的状态时,如何获取实时
- firebase - Firebase 云函数内联代码编辑器
- c# - 将我的 windows 窗体/SQLDataBase 上传到 github
- c# - 如何在 MVC 中添加复选框的选中属性?
- java - 用于在构建期间检查编码约定的 Maven 插件
- python - pdfkit 正在从 pyinstaller 中粉碎 exe
- javascript - 如何添加
- 至
- 带有来自本地存储的数据元素
- 至