python - Group by + New Column + 根据条件获取前一行的值
问题描述
我有这一套
df = pd.DataFrame({'user':[1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,4],
'date':['1995-09-01','1995-09-02','1995-10-03','1995-10-04','1995-10-05','1995-11-07','1995-11-08','1995-11-09','1995-11-10','1995-11-15','1995-12-18','1995-12-19','1995-12-20','1995-12-23','1995-12-26','1995-12-27'],
'dc':['1995-09-02','1995-09-02','1995-10-02','1995-10-05','1995-10-05','1995-11-05','1995-11-05','1995-11-10','1995-11-10','1995-11-10','1995-12-10','1995-12-23','1995-12-23','1995-12-23','1995-12-23','1995-12-23'],
'tp':['s','c','f','s','c','c','f','s','c','s','f','s','s','c','s','f'],
'vt':['0','1','0','0','1','0','0','0','1','0','0','0','0','1','0','0'],
'c1':['1','5','0','2','3','9','3','2','0','5','5','6','4','0','6','0'],
'c2':['3','4','0','2','5','3','8','4','0','6','2','7','0','0','8','0'],
'c3':['5','5','2','5','6','4','2','4','4','6','3','4','3','8','2','7']})
df
这使:
user date dc tp vt c1 c2 c3
1 1995-09-01 1995-09-02 s 0 1 3 5
1 1995-09-02 1995-09-02 c 1 5 4 5
1 1995-10-03 1995-10-02 f 0 0 0 2
2 1995-10-04 1995-10-05 s 0 2 2 5
2 1995-10-05 1995-10-05 c 1 3 5 6
2 1995-11-07 1995-11-05 c 0 9 3 4
2 1995-11-08 1995-11-05 f 0 3 8 2
3 1995-11-09 1995-11-10 s 0 2 4 4
3 1995-11-10 1995-11-10 c 1 0 0 4
3 1995-11-15 1995-11-10 s 0 5 6 6
3 1995-12-18 1995-12-10 f 0 5 2 3
4 1995-12-19 1995-12-23 s 0 6 7 4
4 1995-12-20 1995-12-23 s 0 4 0 3
4 1995-12-23 1995-12-23 c 1 0 0 8
4 1995-12-26 1995-12-23 s 0 6 8 2
4 1995-12-27 1995-12-23 f 0 0 0 7
我想创建新列创建新列 df['dc2'],其中 groupby 用户,列 df['dc2']= df['dc']。但是如果 df['dc'] 满足条件 'tp'='c' & 'vt'=1 & 'c1'=0 & 'c2'=0,则抓取前一个条目的日期(用户的原始)
#IE。对于用户 3,在 df['dc'] 列上,如果我们查看条目 'tp'=' c ' & 'vt'= 1,我们可以看到它具有 'c1'= 0和 'c2'= 0,#因此df['dc2'] 的值将是(对于用户 3)“ 1995-11-09 ”而不是“1995-11-10”
#IE。对于用户 4,在 df['dc'] 列上,如果我们查看条目 'tp'= ' c ' & 'vt'= 1,我们可以看到它有 'c1'= 0和 'c2'= 0,在这个case df['dc2'] 应该是(对于用户 4)' 1995-12-20 '而不是'1995-12-23'
这是期望的结果:
user date dc dc2 tp vt c1 c2 c3
1 1995-09-01 1995-09-02 1995-09-02 s 0 1 3 5
1 1995-09-02 1995-09-02 1995-09-02 c 1 5 4 5
1 1995-10-03 1995-10-02 1995-10-02 f 0 0 0 2
2 1995-10-04 1995-10-05 1995-10-05 s 0 2 2 5
2 1995-10-05 1995-10-05 1995-10-05 c 1 3 5 6
2 1995-11-07 1995-11-05 1995-11-05 c 0 9 3 4
2 1995-11-08 1995-11-05 1995-11-05 f 0 3 8 2
3 1995-11-09 1995-11-10 1995-11-09 s 0 2 4 4
3 1995-11-10 1995-11-10 1995-11-09 c 1 0 0 4
3 1995-11-15 1995-11-10 1995-11-09 s 0 5 6 6
3 1995-12-18 1995-12-10 1995-12-09 f 0 5 2 3
4 1995-12-19 1995-12-23 1995-12-20 s 0 6 7 4
4 1995-12-20 1995-12-23 1995-12-20 s 0 4 0 3
4 1995-12-23 1995-12-23 1995-12-20 c 1 0 0 8
4 1995-12-26 1995-12-23 1995-12-20 s 0 6 8 2
4 1995-12-27 1995-12-23 1995-12-20 f 0 0 0 7
解决方案
让我们创建一个表示条件tp=c
& vt=1
& c1=0
&的布尔掩码c2=0
,然后在列上分组user
并应用自定义转换函数,该函数f
根据条件选择前一行的值:
m = df['tp'].eq('c') & df['vt'].eq('1')\
& df['c1'].eq('0') & df['c2'].eq('0')
f = lambda s: s.mask(~m.shift(-1, fill_value=False)).ffill().bfill()
df['dc2'] = df.groupby('user')['date'].apply(f).fillna(df['dc'])
user date dc tp vt c1 c2 c3 dc2
0 1 1995-09-01 1995-09-02 s 0 1 3 5 1995-09-02
1 1 1995-09-02 1995-09-02 c 1 5 4 5 1995-09-02
2 1 1995-10-03 1995-10-02 f 0 0 0 2 1995-10-02
3 2 1995-10-04 1995-10-05 s 0 2 2 5 1995-10-05
4 2 1995-10-05 1995-10-05 c 1 3 5 6 1995-10-05
5 2 1995-11-07 1995-11-05 c 0 9 3 4 1995-11-05
6 2 1995-11-08 1995-11-05 f 0 3 8 2 1995-11-05
7 3 1995-11-09 1995-11-10 s 0 2 4 4 1995-11-09
8 3 1995-11-10 1995-11-10 c 1 0 0 4 1995-11-09
9 3 1995-11-15 1995-11-10 s 0 5 6 6 1995-11-09
10 3 1995-12-18 1995-12-10 f 0 5 2 3 1995-11-09
11 4 1995-12-19 1995-12-23 s 0 6 7 4 1995-12-20
12 4 1995-12-20 1995-12-23 s 0 4 0 3 1995-12-20
13 4 1995-12-23 1995-12-23 c 1 0 0 8 1995-12-20
14 4 1995-12-26 1995-12-23 s 0 6 8 2 1995-12-20
15 4 1995-12-27 1995-12-23 f 0 0 0 7 1995-12-20
推荐阅读
- c++ - 如何以特定方式打印二叉搜索树
- angular - 如何使用顺风实现像谷歌这样的最佳拟合布局与网格相遇
- java - 为什么我在java中的arraylist会自己变空
- c++ - c ++将LPVOID转换为TArray
- mysql - express.js - req.body 未定义
- arrays - 如何避免特定像素不会被处理两次或更多次进行图像处理?声明一个表?
- c# - 计算增值税时的转换错误 C#-
- windows - 如何确定其他用户是否正在使用pdf文件
- jenkins - 如果设置共享库失败,如何运行发布操作?
- arrays - 如何将 jq 值从主 shell 脚本传递到子 shell 脚本