python - Weighted average grouping by date (in index) in pandas DataFrame
问题描述
I need some help trying to obtain a weighted average for a column, based on datetime-index and another 'mass' column. For example:
d = {'date': ['2021-08-01 12:00:00', '2021-08-01 13:00:00', '2021-08-01 14:00:00', '2021-08-02 15:00:00', '2021-08-02 16:00:00', '2021-08-02 17:00:00'],
'mass': [23, 40, 10, 12, 15, 11],
'%': [0.4, 0.7, 0.9, 0.1, 0.2, 0.8]
}
df = pd.DataFrame(data=d)
df.set_index('date')
What I need is to obtain the weighted average of '%' every 2 hours, being 'mass' the column of weights.
I need the output to be like this:
date(index) | mass | %
2021-08-01 13:00:00 | 43 | 0.865
2021-08-02 15:00:00 | 22 | 0.464
2021-08-02 17:00:00 | 26 | 0.454
Being '%' calculated as weighted average:
0.865 = ((23 x 0.4) + (40 x 0.7))/ (40 + 23)
Please note that I may not have continuous datetimes in index, for example, I could skip several hours because of lack of data, or even days. In that case, I need the output to be NaN if there is no data to calculate with. In case there is only one row of data, assume that row to be the weighted average for the period...
Periods may vary as well, from 2 hours to 12 months...
On a previous question, someone already helped me with this:
out = df.assign(k=df['mass'].mul(df['%']))[['mass','k']].sum(level=0)
out['%'] = out.pop('k').div(out['mass'])
Though, this works only when I have repeated indexes and this time this is not the case.
解决方案
Try grouping every 2 hours and you will get closer -
d = {'date': ['2021-08-01 12:00:00', '2021-08-01 13:00:00', '2021-08-01 14:00:00', '2021-08-02 15:00:00', '2021-08-02 16:00:00', '2021-08-02 17:00:00'],
'mass': [23, 40, 10, 12, 15, 11],
'%': [0.4, 0.7, 0.9, 0.1, 0.2, 0.8]
}
df = pd.DataFrame(data=d)
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date')
df['mass_wt'] = df['mass'] * df['%']
op = df.groupby(pd.Grouper(freq='2H')).agg({'mass': 'sum', 'mass_wt': 'sum'}).query('mass > 0')
op['op'] = op['mass_wt'] / op['mass']
推荐阅读
- reactjs - 反应
- r - 测试数据中具有膨胀趋势的单变量时间序列数据
- android - Firestore - 添加对象数组
- python - 如何使表中的 2 个属性唯一(Django)
- makefile - 使用 Makefile 编辑文件
- snowflake-cloud-data-platform - 使用特殊转义字符导入数据
- macos - 当 Visual Studio 代码聚焦时切换功能键
- c++ - 将 output_iterator 概念指定为概念内的函数参数
- r - RStudio Connect - 从雪花数据库下载时出现间歇性 C 堆栈使用错误
- firebase - Vue/Nuxt - 使用 v-for 遍历 Firestore 文档集合