首页 > 解决方案 > pandas groupby 标志记录并应用回原始数据帧

问题描述

下面是一个数据框,其中包含组和一个称为输入的列。我想创建第二个列,它将按组标记列输入的第一次出现,并将组内的其余记录设置为 0。下面是一个示例:

original_df = pd.DataFrame({'group': ['a','a','a','b','b','c','c','c','c','c','c','c','c'], 
                            'input': [0,1,1,0,0,0,0,0,0,1,1,1,1]})

   group  input
0      a      0
1      a      1
2      a      1
3      b      0
4      b      0
5      c      0
6      c      0
7      c      0
8      c      0
9      c      1
10     c      1
11     c      1
12     c      1

desired_df = pd.DataFrame({'group': ['a','a','a','b','b','c','c','c','c','c','c','c','c'], 
                            'input': [0,1,1,0,0,0,0,0,0,1,1,1,1],  
                            'desired_input': [0,1,0,0,0,0,0,0,0,1,0,0,0]})

   group  input  desired_input
0      a      0              0
1      a      1              1
2      a      1              0
3      b      0              0
4      b      0              0
5      c      0              0
6      c      0              0
7      c      0              0
8      c      0              0
9      c      1              1
10     c      1              0
11     c      1              0
12     c      1              0

标签: pandaspandas-groupby

解决方案


尝试 drop_duplicates:

import pandas as pd

import pandas as pd

df = pd.DataFrame(
    {'group': ['a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c'],
     'input': [0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1]})

df['flag'] = df[df['input'].eq(1)] \
    .drop_duplicates(['group'], keep='first')['input']
df['flag'] = df['flag'].fillna(0).astype(int)

print(df)

df

   group  input  flag
0      a      0     0
1      a      1     1
2      a      1     0
3      b      1     1
4      b      1     0
5      c      0     0
6      c      0     0
7      c      0     0
8      c      0     0
9      c      1     1
10     c      1     0
11     c      1     0
12     c      1     0

推荐阅读