首页 > 解决方案 > 基于另一列中的值的累积计数

问题描述

我试图返回一个cumulative count基于 other columns。对于df下面我想使用Outcomeand返回一个计数Aa,Bb,Cc,Dd。具体来说,如果XY在结果中,我想返回最近增加的整数Aa,Bb,Cc,Dd。因此,当XY列出时,我想返回Aa,Bb,Cc,Dd最近增加的整数。

我尝试过使用以下方法:

import pandas as pd

d = ({
    'Outcome' : ['','','X','','','X','','Y','','Y'],
    'A' : [0,0,0,1,1,1,2,2,2,2],
    'B' : [0,0,0,1,1,1,1,1,2,2],
    'C' : [0,0,0,1,2,3,3,3,3,3],
    'D' : [0,1,2,2,2,2,2,2,2,2],                          
    })

df = pd.DataFrame(data = d)

m = pd.get_dummies(
      df.where(df.Outcome.ne(df.Outcome.shift()) & df.Outcome.str.len().astype(bool)
      ), prefix='Count').cumsum()

df = pd.concat([
     m.where(m.ne(m.shift())).fillna('', downcast='infer'), df], axis=1)

但这并不完全正确。

我的预期输出是:

  Outcome  A  B  C  D  A_X  A_Y  B_X  B_Y  C_X  C_Y  D_X  D_Y
0          0  0  0  0    0    0    0    0    0    0    0    0
1          0  0  0  1    0    0    0    0    0    0    0    0
2       X  0  0  0  2    0    0    0    0    0    0    1    0
3          1  1  1  2    0    0    0    0    0    0    1    0
4          1  1  2  2    0    0    0    0    0    0    1    0
5       X  1  1  3  2    0    0    0    0    1    0    1    0
6          2  1  3  2    0    0    0    0    1    0    1    0
7       Y  2  1  3  2    0    1    0    0    1    0    1    0
8          2  2  3  2    0    1    0    0    1    0    1    0
9       Y  2  2  3  2    0    1    0    1    1    0    1    0

标签: pythonpandascount

解决方案


以下是2个片段:

  1. 根据描述,它捕获了第 1 和第 2 X 之间的 A 列的额外增加
  2. 例如,捕获所有 4 列中的最后一个增长

1) 根据描述

for col in 'ABCD':
    df[col+'_X']=0
    df[col+'_Y']=0

for i1, i2 in zip(df[(df.Outcome=='X') | (df.Outcome=='Y') | (df.index==0)].index, 
                  df[(df.Outcome=='X') | (df.Outcome=='Y') | (df.index==0)].index[1::]):
    for col in 'ABCD':
        if df[col][i2]>df[col][i1]:
            df.loc[i2::,col+'_'+df.Outcome[i2]]=df[col+'_'+df.Outcome[i2]][i2-1]+1
print(df)

  Outcome  A  B  C  D  A_X  A_Y  B_X  B_Y  C_X  C_Y  D_X  D_Y
0          0  0  0  0    0    0    0    0    0    0    0    0
1          0  0  0  1    0    0    0    0    0    0    0    0
2       X  0  0  0  2    0    0    0    0    0    0    1    0
3          1  1  1  2    0    0    0    0    0    0    1    0
4          1  1  2  2    0    0    0    0    0    0    1    0
5       X  1  1  3  2    1    0    1    0    1    0    1    0
6          2  1  3  2    1    0    1    0    1    0    1    0
7       Y  2  1  3  2    1    1    1    0    1    0    1    0
8          2  2  3  2    1    1    1    0    1    0    1    0
9       Y  2  2  3  2    1    1    1    1    1    0    1    0

2)根据示例

for col in 'ABCD':
    df[col+'_X']=0
    df[col+'_Y']=0

for i1, i2 in zip(df[(df.Outcome=='X') | (df.Outcome=='Y') | (df.index==0)].index, 
                  df[(df.Outcome=='X') | (df.Outcome=='Y') | (df.index==0)].index[1::]):
    change_col = ''
    change_pos = -1
    for col in 'ABCD':
        if df[col][i2]>df[col][i1]:
            found_change_pos = df[df[col]==df[col][i2]-1].tail(1).index
            if found_change_pos > change_pos:
                change_col = col
                change_pos = found_change_pos
    if change_pos > -1:
        df.loc[i2::,change_col+'_'+df.Outcome[i2]]=df[change_col+'_'+df.Outcome[i2]][i2-1]+1
print(df)
  Outcome  A  B  C  D  A_X  A_Y  B_X  B_Y  C_X  C_Y  D_X  D_Y
0          0  0  0  0    0    0    0    0    0    0    0    0
1          0  0  0  1    0    0    0    0    0    0    0    0
2       X  0  0  0  2    0    0    0    0    0    0    1    0
3          1  1  1  2    0    0    0    0    0    0    1    0
4          1  1  2  2    0    0    0    0    0    0    1    0
5       X  1  1  3  2    0    0    0    0    1    0    1    0
6          2  1  3  2    0    0    0    0    1    0    1    0
7       Y  2  1  3  2    0    1    0    0    1    0    1    0
8          2  2  3  2    0    1    0    0    1    0    1    0
9       Y  2  2  3  2    0    1    0    1    1    0    1    0

推荐阅读