pandas - grouping pandas rows above and below based on specified value
问题描述
Is there a simple way to group pandas dataframe according to a given value. In the code below,the value is 1.5, I want to find the last row with value equal to or greater 1.5 and add a new column 'Group' such that all the rows above that last row are in one group and rows below it are in another group.
import pandas as pd
import numpy as np
fnd_val = 1.5
A = [1,2,3,4,5,6,7,8,9]
B = [-1.1306,-0.5694,-0.7241,1.8211,1.5555,0.0416,1.9236,0.1944,-0.0204]
df = pd.DataFrame({'Class':A, 'Value':B})
df_srtd = df.sort_values(by = "Class", ascending)
>>>df_srtd
Class Value
0 1 -1.1306
1 2 -0.5694
2 3 -0.7241
3 4 1.8211
4 5 1.5555
5 6 0.0416
6 7 1.9236
7 8 0.19440
8 9 -0.0204
#Desired output
Class Value Group
0 1 -1.1306 2
1 2 -0.5694 2
2 3 -0.7241 2
3 4 1.8211 2
4 5 1.5555 2
5 6 0.0416 2
6 7 1.9236 2
7 8 0.19440 1
8 9 -0.0204 1
#or if sorting is reversed like below
df_srtd = df.sort_values(by = "Class", ascending=False)
>>>df_srtd
Class Value
8 9 -0.0204
7 8 0.19440
6 7 1.9236
5 6 0.0416
4 5 1.5555
3 4 1.8211
2 3 -0.7241
1 2 -0.5694
0 1 -1.1306
#Desired output
Class Value Group
8 9 -0.0204 2
7 8 0.19440 2
6 7 1.9236 2
5 6 0.0416 2
4 5 1.5555 2
3 4 1.8211 2
2 3 -0.7241 1
1 2 -0.5694 1
0 1 -1.1306 1
My approach (using reversely sorted data):
import pandas as pd
import numpy as np
A = [1,2,3,4,5,6,7,8,9]
B = [-1.1306,-0.5694,-0.7241,1.8211,1.5555,0.0416,1.9236,0.1944,-0.0204]
df = pd.DataFrame({'Class':A, 'Value':B})
df_srtd = df.sort_values(by = "Class", ascending=False)
df_srtd['val_nxt'] = df_srtd['Value'].shift(-1)
fnd_val = 1.5
conditions = [
(df_srtd['Value'] >= fnd_val),
(df_srtd['Value'] < fnd_val)
& (df_srtd['Value'] < df_srtd['val_nxt']),
(df_srtd['Value'] < fnd_val)
]
choices = [ '2', '2', '1']
df_srtd['Group'] = np.select(conditions, choices, default='-99')
print(df_srtd)
Result obtained:
Class Value val_nxt Group
8 9 -0.0204 0.19440 2
7 8 0.19440 1.9236 2
6 7 1.9236 0.0416 2
5 6 0.0416 1.5555 2
4 5 1.5555 1.8211 2
3 4 1.8211 -0.7241 2 #All after this should be grouped 1
2 3 -0.7241 -0.5694 2 #This one should have been 1 but is grouped 2
1 2 -0.5694 -1.1306 1
0 1 -1.1306 NaN 1
As seen in above result the row with class 3 is put in group 2 instead of 1. I tested by adding more conditions but nothing worked.
解决方案
尝试这个
df_srtd['Group'] = df_srtd.Value.ge(fnd_val)[::-1].cummax() + 1
Out[321]:
Class Value Group
0 1 -1.1306 2
1 2 -0.5694 2
2 3 -0.7241 2
3 4 1.8211 2
4 5 1.5555 2
5 6 0.0416 2
6 7 1.9236 2
7 8 0.1944 1
8 9 -0.0204 1
反转时也是一样的Class
Sample `df_srtd_rev`
Class Value
8 9 -0.0204
7 8 0.1944
6 7 1.9236
5 6 0.0416
4 5 1.5555
3 4 1.8211
2 3 -0.7241
1 2 -0.5694
0 1 -1.1306
df_srtd_rev['Group'] = df_srtd_rev.Value.ge(fnd_val)[::-1].cummax() + 1
Out[326]:
Class Value Group
8 9 -0.0204 2
7 8 0.1944 2
6 7 1.9236 2
5 6 0.0416 2
4 5 1.5555 2
3 4 1.8211 2
2 3 -0.7241 1
1 2 -0.5694 1
0 1 -1.1306 1
推荐阅读
- php - 在 PhpStorm 中创建新类时更改间距
- python - 如何在python中使用boto3查询cloudwatch日志
- r - 与 Recode 结合使用时,Mutate_at 函数如何工作?
- swift - 返回不透明返回类型的关联类型
- ios - 在 Swift 中使用 Google Place Photos 请求显示图像
- v8 - 尝试更新库时无法找到“v8_base”
- python - 决策树图不起作用 python 3.6 不保存
- java - 在 JPanel 中对齐 JTextField、JLabels 和 JButton
- css - 如何在 CSS 中强制使用相同数量的文本?
- html - 每个浏览器的背景图都不一样,哪个是正确的?