python - 如何根据每个 id 的条件选择行
问题描述
我有以下数据框:
Hotel_id Month_Year Chef_Id Chef_is_masterchef Transition
2400188 February-2018 4597566 1 0
2400188 March-2018 4597566 1 0
2400188 April-2018 4597566 1 0
2400188 May-2018 4597566 1 0
2400188 June-2018 4597566 1 0
2400188 July-2018 4597566 1 0
2400188 August-2018 4597566 1 0
2400188 September-2018 4597566 0 1
2400188 October-2018 4597566 0 0
2400188 November-2018 4597566 0 0
2400188 December-2018 4597566 0 0
2400188 January-2019 4597566 0 0
2400188 February-2019 4597566 0 0
2400188 March-2019 4597566 0 0
2400188 April-2019 4597566 0 0
2400188 May-2019 4597566 0 0
2400614 May-2015 2297544 0 0
2400614 June-2015 2297544 0 0
2400614 July-2015 2297544 0 0
2400614 August-2015 2297544 0 0
2400614 September-2015 2297544 0 0
2400614 October-2015 2297544 0 0
2400614 November-2015 2297544 0 0
2400614 December-2015 2297544 0 0
2400614 January-2016 2297544 1 1
2400614 February-2016 2297544 1 0
2400614 March-2016 2297544 1 0
3400624 May-2016 2597531 0 0
3400624 June-2016 2597531 0 0
3400624 July-2016 2597531 0 0
3400624 August-2016 2597531 1 1
2400133 February-2016 4597531 0 0
2400133 March-2016 4597531 0 0
2400133 April-2016 4597531 0 0
2400133 May-2016 4597531 0 0
2400133 June-2016 4597531 0 0
2400133 July-2016 4597531 0 0
2400133 August-2016 4597531 1 1
2400133 September-2016 4597531 1 0
2400133 October-2016 4597531 1 0
2400133 November-2016 4597531 1 0
2400133 December-2016 4597531 1 0
2400133 January-2017 4597531 1 0
2400133 February-2017 4597531 1 0
2400133 March-2017 4597531 1 0
2400133 April-2017 4597531 1 0
2400133 May-2017 4597531 1 0
当Chef_is_Masterchef列中发生从0 到 1或1 到 0的转换时,此转换在Transition列中指示为1。
实际上,我想创建另一列(命名为“ Var ”),其中的值将按如下所述为原始数据框填充,
预期数据框:
Hotel_id Month_Year Chef_Id Chef_is_masterchef Transition Var
2400188 February-2018 4597566 1 0 -7
2400188 March-2018 4597566 1 0 -6
2400188 April-2018 4597566 1 0 -5
2400188 May-2018 4597566 1 0 -4
2400188 June-2018 4597566 1 0 -3
2400188 July-2018 4597566 1 0 -2
2400188 August-2018 4597566 1 0 -1
2400188 September-2018 4597566 0 1 0
2400188 October-2018 4597566 0 0 1
2400188 November-2018 4597566 0 0 2
2400188 December-2018 4597566 0 0 3
2400188 January-2019 4597566 0 0 4
2400188 February-2019 4597566 0 0 5
2400188 March-2019 4597566 0 0 6
2400188 April-2019 4597566 0 0 7
2400188 May-2019 4597566 0 0 8
2400614 May-2015 2297544 0 0 -8
2400614 June-2015 2297544 0 0 -7
2400614 July-2015 2297544 0 0 -6
2400614 August-2015 2297544 0 0 -5
2400614 September-2015 2297544 0 0 -4
2400614 October-2015 2297544 0 0 -3
2400614 November-2015 2297544 0 0 -2
2400614 December-2015 2297544 0 0 -1
2400614 January-2016 2297544 1 1 0
2400614 February-2016 2297544 1 0 1
2400614 March-2016 2297544 1 0 2
3400624 May-2016 2597531 0 0 -3
3400624 June-2016 2597531 0 0 -2
3400624 July-2016 2597531 0 0 -1
3400624 August-2016 2597531 1 1 0
2400133 February-2016 4597531 0 0 -6
2400133 March-2016 4597531 0 0 -5
2400133 April-2016 4597531 0 0 -4
2400133 May-2016 4597531 0 0 -3
2400133 June-2016 4597531 0 0 -2
2400133 July-2016 4597531 0 0 -1
2400133 August-2016 4597531 1 1 0
2400133 September-2016 4597531 1 0 1
2400133 October-2016 4597531 1 0 2
2400133 November-2016 4597531 1 0 3
2400133 December-2016 4597531 1 0 4
2400133 January-2017 4597531 1 0 5
2400133 February-2017 4597531 1 0 6
2400133 March-2017 4597531 1 0 7
2400133 April-2017 4597531 1 0 8
2400133 May-2017 4597531 1 0 9
如果观察到,在Var列的转换点,我将值设为零,并且对于之前和之后的行,我保持相应的整数值。
但是在使用下面的代码后,我在 Var 列中遇到了问题,
s = df['Chef_is_masterchef'].eq(0).groupby(df['Chef_Id']).transform('sum')
df['var'] = df.groupby('Chef_Id').cumcount().sub(s)
上述代码的输出:
Hotel_id Month_Year Chef_Id Chef_is_masterchef Transition Var
2400188 February-2018 4597566 1 0 -9
2400188 March-2018 4597566 1 0 -8
2400188 April-2018 4597566 1 0 -7
2400188 May-2018 4597566 1 0 -6
2400188 June-2018 4597566 1 0 -5
2400188 July-2018 4597566 1 0 -4
2400188 August-2018 4597566 1 0 -3
2400188 September-2018 4597566 0 1 -2
2400188 October-2018 4597566 0 0 -1
2400188 November-2018 4597566 0 0 0
2400188 December-2018 4597566 0 0 1
2400188 January-2019 4597566 0 0 2
2400188 February-2019 4597566 0 0 3
2400188 March-2019 4597566 0 0 4
2400188 April-2019 4597566 0 0 5
2400188 May-2019 4597566 0 0 6
2400614 May-2015 2297544 0 0 -8
2400614 June-2015 2297544 0 0 -7
2400614 July-2015 2297544 0 0 -6
2400614 August-2015 2297544 0 0 -5
2400614 September-2015 2297544 0 0 -4
2400614 October-2015 2297544 0 0 -3
2400614 November-2015 2297544 0 0 -2
2400614 December-2015 2297544 0 0 -1
2400614 January-2016 2297544 1 1 0
2400614 February-2016 2297544 1 0 1
2400614 March-2016 2297544 1 0 2
3400624 May-2016 2597531 0 0 -3
3400624 June-2016 2597531 0 0 -2
3400624 July-2016 2597531 0 0 -1
3400624 August-2016 2597531 1 1 0
2400133 February-2016 4597531 0 0 -6
2400133 March-2016 4597531 0 0 -5
2400133 April-2016 4597531 0 0 -4
2400133 May-2016 4597531 0 0 -3
2400133 June-2016 4597531 0 0 -2
2400133 July-2016 4597531 0 0 -1
2400133 August-2016 4597531 1 1 0
2400133 September-2016 4597531 1 0 1
2400133 October-2016 4597531 1 0 2
2400133 November-2016 4597531 1 0 3
2400133 December-2016 4597531 1 0 4
2400133 January-2017 4597531 1 0 5
2400133 February-2017 4597531 1 0 6
2400133 March-2017 4597531 1 0 7
2400133 April-2017 4597531 1 0 8
2400133 May-2017 4597531 1 0 9
如果已观察到,对于 Chef_Id = 4597566,您可以在转换点看到 Var 列中的值不同而不是零。
这会产生一个问题,因为在过渡点,我必须为每个 ID 选择最多前 3 个月和最多后 2 个月的行。同样在过渡点,我必须使用以下代码为每个 id 选择最多包括 6 个月之前和 5 个月之后的行:
df1 = df[df['var'].between(-3, 2)]
print (df1)
df2 = df[df['var'].between(-6, 5)]
print (df2)
所以请让我知道解决方案。
提前致谢!
解决方案
用于GroupBy.cumcount
每组的计数器,然后0
通过比较0
和减去值的数量GroupBy.transform
:
s = df['Chef_is_masterchef'].eq(0).groupby(df['Chef_Id']).transform('sum')
df['var'] = df.groupby('Chef_Id').cumcount().sub(s)
print (df)
Hotel_id Month_Year Chef_Id Chef_is_masterchef Transition var
0 2400614 May-2015 2297544 0 0 -8
1 2400614 June-2015 2297544 0 0 -7
2 2400614 July-2015 2297544 0 0 -6
3 2400614 August-2015 2297544 0 0 -5
4 2400614 September-2015 2297544 0 0 -4
5 2400614 October-2015 2297544 0 0 -3
6 2400614 November-2015 2297544 0 0 -2
7 2400614 December-2015 2297544 0 0 -1
8 2400614 January-2016 2297544 1 1 0
9 2400614 February-2016 2297544 1 0 1
10 2400614 March-2016 2297544 1 0 2
11 3400624 May-2016 2597531 0 0 -3
12 3400624 June-2016 2597531 0 0 -2
13 3400624 July-2016 2597531 0 0 -1
14 3400624 August-2016 2597531 1 1 0
15 2400133 February-2016 4597531 0 0 -6
16 2400133 March-2016 4597531 0 0 -5
17 2400133 April-2016 4597531 0 0 -4
18 2400133 May-2016 4597531 0 0 -3
19 2400133 June-2016 4597531 0 0 -2
20 2400133 July-2016 4597531 0 0 -1
21 2400133 August-2016 4597531 1 1 0
22 2400133 September-2016 4597531 1 0 1
23 2400133 October-2016 4597531 1 0 2
24 2400133 November-2016 4597531 1 0 3
25 2400133 December-2016 4597531 1 0 4
26 2400133 January-2017 4597531 1 0 5
27 2400133 February-2017 4597531 1 0 6
28 2400133 March-2017 4597531 1 0 7
29 2400133 April-2017 4597531 1 0 8
30 2400133 May-2017 4597531 1 0 9
最后过滤器Series.between
:
df1 = df[df['var'].between(-3, 2)]
print (df1)
df2 = df[df['var'].between(-6, 5)]
print (df2)
推荐阅读
- python - 如何使用 CodeHS 8.4.13: Owls, Part 2 的枚举函数?
- bash - 逐行应用字符串替换
- angular - NGRX 过滤器导致 ExpressionChangedAfterItHasBeenCheckedError
- javascript - javascript中未填写字段时如何显示错误消息?
- ruby - 总是在“捆绑执行”之前“捆绑安装”?
- php - 如何使所有html标签与php在同一行中关闭
- python - 如何简化具有索引比较的嵌套循环?
- arduino - 如何知道我的 esp8266 的 ssid 和密码?AT 命令不起作用。我如何知道 CWJAP 用户名和密码?
- c++ - 将抽象类添加到向量时出错
- java - BukkitRunnable 的返回值