首页 > 解决方案 > grouping pandas rows above and below based on specified value

问题描述

Is there a simple way to group pandas dataframe according to a given value. In the code below,the value is 1.5, I want to find the last row with value equal to or greater 1.5 and add a new column 'Group' such that all the rows above that last row are in one group and rows below it are in another group.

import pandas as pd
import numpy as np
fnd_val = 1.5
A = [1,2,3,4,5,6,7,8,9]
B = [-1.1306,-0.5694,-0.7241,1.8211,1.5555,0.0416,1.9236,0.1944,-0.0204]
df = pd.DataFrame({'Class':A, 'Value':B})
df_srtd = df.sort_values(by = "Class", ascending)

>>>df_srtd
        Class   Value     
    0      1    -1.1306   
    1      2    -0.5694     
    2      3    -0.7241   
    3      4    1.8211    
    4      5    1.5555    
    5      6    0.0416    
    6      7    1.9236    
    7      8    0.19440    
    8      9    -0.0204   

#Desired output


        Class   Value     Group
    0      1    -1.1306   2
    1      2    -0.5694   2 
    2      3    -0.7241   2
    3      4    1.8211    2
    4      5    1.5555    2
    5      6    0.0416    2
    6      7    1.9236    2
    7      8    0.19440   1
    8      9    -0.0204   1

#or if sorting is reversed like below
df_srtd = df.sort_values(by = "Class", ascending=False)


>>>df_srtd
        Class   Value     
    8      9    -0.0204    
    7      8     0.19440    
    6      7     1.9236    
    5      6     0.0416    
    4      5     1.5555    
    3      4     1.8211    
    2      3    -0.7241    
    1      2    -0.5694    
    0      1    -1.1306    



#Desired output
    Class   Value     Group
8      9    -0.0204    2
7      8     0.19440   2
6      7     1.9236    2
5      6     0.0416    2
4      5     1.5555    2
3      4     1.8211    2
2      3    -0.7241    1
1      2    -0.5694    1
0      1    -1.1306    1

My approach (using reversely sorted data):

import pandas as pd
import numpy as np

A = [1,2,3,4,5,6,7,8,9]
B = [-1.1306,-0.5694,-0.7241,1.8211,1.5555,0.0416,1.9236,0.1944,-0.0204]
df = pd.DataFrame({'Class':A, 'Value':B})
df_srtd = df.sort_values(by = "Class", ascending=False)
df_srtd['val_nxt'] = df_srtd['Value'].shift(-1) 

fnd_val = 1.5

conditions = [
        (df_srtd['Value'] >= fnd_val),
        (df_srtd['Value'] < fnd_val) 
        & (df_srtd['Value'] < df_srtd['val_nxt']),
        (df_srtd['Value'] < fnd_val) 
        ]

choices = [ '2', '2', '1']
df_srtd['Group'] = np.select(conditions, choices, default='-99')
print(df_srtd)

Result obtained:

    Class   Value     val_nxt   Group
8      9    -0.0204   0.19440    2
7      8     0.19440  1.9236     2
6      7     1.9236   0.0416     2
5      6     0.0416   1.5555     2
4      5     1.5555   1.8211     2
3      4     1.8211  -0.7241     2 #All after this should be grouped 1
2      3    -0.7241  -0.5694     2 #This one should have been 1 but is grouped 2
1      2    -0.5694  -1.1306     1
0      1    -1.1306      NaN     1

As seen in above result the row with class 3 is put in group 2 instead of 1. I tested by adding more conditions but nothing worked.

标签: pandas

解决方案


尝试这个

df_srtd['Group'] = df_srtd.Value.ge(fnd_val)[::-1].cummax() + 1

Out[321]:
   Class   Value  Group
0      1 -1.1306      2
1      2 -0.5694      2
2      3 -0.7241      2
3      4  1.8211      2
4      5  1.5555      2
5      6  0.0416      2
6      7  1.9236      2
7      8  0.1944      1
8      9 -0.0204      1

反转时也是一样的Class

Sample `df_srtd_rev`

   Class   Value
8      9 -0.0204
7      8  0.1944
6      7  1.9236
5      6  0.0416
4      5  1.5555
3      4  1.8211
2      3 -0.7241
1      2 -0.5694
0      1 -1.1306

df_srtd_rev['Group'] = df_srtd_rev.Value.ge(fnd_val)[::-1].cummax() + 1

Out[326]:
   Class   Value  Group
8      9 -0.0204      2
7      8  0.1944      2
6      7  1.9236      2
5      6  0.0416      2
4      5  1.5555      2
3      4  1.8211      2
2      3 -0.7241      1
1      2 -0.5694      1
0      1 -1.1306      1

推荐阅读