python - 如何在熊猫中执行条件分组计算并前向填充零值
问题描述
我有以下数据框:
import pandas as pd
#Create DF
d = {
'Date': ['1/01/2021','2/01/2021','3/01/2021','4/01/2021','5/01/2021','6/01/2021','7/01/2021','8/01/2021','9/01/2021','10/01/2021','11/01/2021','12/01/2021','13/01/2021',
'14/01/2021','15/01/2021','16/01/2021'],
'Name': ['Joe','Joe','Joe','Joe','Joe','Joe','Joe','Joe','Joe','John','John','John','John','John','John','John'],
'Criteria':[30,35,2.5,3,6,15,20,1.5,4,25,50,75,2,4,9,2],
'Sum' : [-1,-1,1.5,-1,5,-1,-1,-1,3,-1,-1,-1,-1,3,-1,-1],
'Sum Divided by Count where Criteria <5' : [0.000,0.000,1.500,0.250,0.000,0.000,0.000,-0.167,0.625,0.000,0.000,0.000,-1.000,1.000,0.000,0.333]
}
df = pd.DataFrame(data=d)
df['Date'] = pd.to_datetime(df.Date,format='%d/%m/%Y')
df
最后一列计算小于的列的总和除以小于Sum
的行数。并按 分组。红色突出显示符合条件的行。Criteria
5
Criteria
5
Name
我还想做的是向所有行添加前向填充,但将初始行保持为零。
以上所有在熊猫中都可能吗?
我的最终结果应该是:
任何帮助将不胜感激!
解决方案
将下面的值替换为 in中5
的缺失值,然后每组要求将总和除以计数,如果值不匹配,则将它们替换为in :Sum
DataFrame.assign
Expanding.mean
0
Series.where
m = df['Criteria'] < 5
df['new'] = (df.assign(Sum = df['Sum'].where(m))
.groupby('Name')['Sum']
.expanding()
.mean()
.reset_index(level=0, drop=True)
.where(m)
.groupby(df['Name'])
.ffill()
.fillna(0))
print (df)
Date Name Criteria Sum Sum Divided by Count where Criteria <5 \
0 2021-01-01 Joe 30.0 -1.0 0.000
1 2021-01-02 Joe 35.0 -1.0 0.000
2 2021-01-03 Joe 2.5 1.5 1.500
3 2021-01-04 Joe 3.0 -1.0 0.250
4 2021-01-05 Joe 6.0 5.0 0.000
5 2021-01-06 Joe 15.0 -1.0 0.000
6 2021-01-07 Joe 20.0 -1.0 0.000
7 2021-01-08 Joe 1.5 -1.0 -0.167
8 2021-01-09 Joe 4.0 3.0 0.625
9 2021-01-10 John 25.0 -1.0 0.000
10 2021-01-11 John 50.0 -1.0 0.000
11 2021-01-12 John 75.0 -1.0 0.000
12 2021-01-13 John 2.0 -1.0 -1.000
13 2021-01-14 John 4.0 3.0 1.000
14 2021-01-15 John 9.0 -1.0 0.000
15 2021-01-16 John 2.0 -1.0 0.333
new
0 0.000000
1 0.000000
2 1.500000
3 0.250000
4 0.250000
5 0.250000
6 0.250000
7 -0.166667
8 0.625000
9 0.000000
10 0.000000
11 0.000000
12 -1.000000
13 1.000000
14 1.000000
15 0.333333
推荐阅读
- perl - 使用 XML::LibXML 向元素添加新类
- php - 如何修改 $session 数组变量中的值对象
- django - 安装 boto 时没有名为“storages.backends.s3boto”的模块
- c++ - 如果一起指定多个优化标志,gcc(和 clang)会做什么?
- javascript - Webpack:收到此错误:构建失败(来自 ./node_modules/css-loader/dist/cjs.js):
- java - heroku 错误:无法找到或加载主类
- c# - 有没有办法将 NAudio.Lame 与 Xamarin (Android/iOS) 应用程序一起使用?
- python - 创建带有图像的 .csv
- c++ - 如何在 Arduino 上使用 char 数组实现函数
- django - 序列化后如何将模型字段添加到对象