python - 根据条件在 Pandas 中分组
问题描述
我有一个数据框
|phone_number|call_date|answered| attempt|
|123 | 13thJune| 1 | 1 |
|234 | 15thJune| 0 | 1 |
|234 | 15thJune| 0 | 2 |
我想执行 groupby 并取出回答的最大日期。即如果呼叫未接听,即 0 ,则接听的最大日期应为空白。
df.groupby(['phone_number'])['Call_Date'].max().reset_index()
只有当answered is > 0
这个 groupby 应该给我一个blank
我如何实现这一目标?
预期 df
phone_number | max_call_date
123 | 13th June
234 | Nan
解决方案
第一个想法是过滤掉 not 0
in 的行answered
,聚合max
并添加过滤后的行,phone_number
如NaN
s by Series.reindex
:
df1 = (df[df['answered'].ne(0)]
.groupby(['phone_number'])['call_date']
.max()
.reindex(df['phone_number'].unique())
.reset_index(name='max_call_date'))
print (df1)
phone_number max_call_date
0 123 13thJune
1 234 NaN
或者如果然后聚合替换call_date
为缺失值:answered=0
max
df1 = (df.assign(call_date = df['call_date'].mask(df['answered'].eq(0)))
.groupby(['phone_number'])['call_date'].max()
.reset_index(name='max_call_date'))
print (df1)
phone_number max_call_date
0 123 13thJune
1 234 NaN
NaN
如果列的至少一个值answered=0
和最小值是,则最后一个想法是否需要设置answered=0
:
df1 = df.groupby('phone_number', as_index=False).agg({'call_date':'max', 'answered':'min'})
df1['max_call_date'] = df1.pop('call_date').mask(df1.pop('answered').eq(0))
print (df1)
phone_number max_call_date
0 123 13thJune
1 234 NaN
编辑:为了从字符串中获得正确的最大日期时间,必须将列转换为日期时间:
df['call_date'] = pd.to_datetime(df['call_date'].str.replace('st|nd|rd|th',' ',regex=True),
format='%d %B')
df1 = (df[df['answered'].ne(0)]
.groupby(['phone_number'])['call_date']
.max()
.reindex(df['phone_number'].unique())
.reset_index(name='max_call_date'))
print (df1)
phone_number max_call_date
0 123 1900-06-13
1 234 NaT
df1 = (df.assign(call_date = df['call_date'].mask(df['answered'].eq(0)))
.groupby(['phone_number'])['call_date'].max()
.reset_index(name='max_call_date'))
print (df1)
phone_number max_call_date
0 123 1900-06-13
1 234 NaT
df1 = df.groupby('phone_number', as_index=False).agg({'call_date':'max', 'answered':'min'})
df1['max_call_date'] = df1.pop('call_date').mask(df1.pop('answered').eq(0))
print (df1)
phone_number max_call_date
0 123 1900-06-13
1 234 NaT