首页 > 解决方案 > Groupby 和条件替换

问题描述

我想按特定列(id)对值进行分组,并将所有值替换为与给定 id 关联的最大日期时间。

这是我写的代码(不起作用)

file.groupby('data__id')['data__answered_at'].apply(lambda x: x['data__answered_at'] == x['data__answered_at'].max())

这是我的数据框的示例

data__id     data__answered_at
1              2019-01-10
1                  Na 
2              2019-01-12
2                  Na
3                  Na
4                  Na
4                  Na
5                  Na
5              2019-01-15   

标签: pythonpandas

解决方案


使用to_datetimewitherrors='coerce'将非日期时间替换为NaT,然后使用 获取每组的最大值GroupBy.transform,因此可以将缺失值替换为Series.fillna

df['data__answered_at'] = pd.to_datetime(df['data__answered_at'], errors='coerce')

s = df.groupby('data__id')['data__answered_at'].transform('max')
df['data__answered_at'] = df['data__answered_at'].fillna(s)
print (df)
   data__id data__answered_at
0         1        2019-01-10
1         1        2019-01-10
2         2        2019-01-12
3         2        2019-01-12
4         3               NaT
5         4               NaT
6         4               NaT
7         5        2019-01-15
8         5        2019-01-15

您的解决方案应该用 lambda 函数和重写fillna

f = lambda x: x.fillna(x.max())
df['data__answered_at'] = df.groupby('data__id')['data__answered_at'].apply(f)

推荐阅读