首页 > 解决方案 > 如何计算这个df Python Pandas的阶段之间的天数?

问题描述

df = pd.DataFrame({'Campaign ID':[48464,48464,48464,48464,26380,26380,22676,39529,39529,46029,46029,46029,17030,46724,46724,39379,39379,39379],
'Campaign stage':["Lost","Developing","Discussing","Starting","Discussing", "Starting","Developing",    "Discussing","Starting","Developing",   "Discussing","Starting","Developing",   "Developing","Discussing","Lost",   "Developing","Discussing"],
'Stage Number':[-1, 3,  2,  1,  2,  1,  3,  2,  1,  3,  2,  1,  3,  3,  2,  -1, 3,  2],
'Campaign Date':["2/8/2019","1/9/2019","1/3/2019","3/3/2018","2/14/2019","12/5/2018","7/25/2018","6/8/2018","3/4/2018","12/8/2018","9/9/2018","5/31/2018","6/7/2018","3/27/2018","1/6/2018","2/15/2019","12/15/2018","9/4/2018"]})

pvt = pd.pivot_table(df,values=['Campaign stage'],index=['Campaign ID','Campaign stage','Stage Number','Campaign Date'],aggfunc='count')
pvt.sort_values(['Campaign ID','Campaign Date'],ascending=[True,False])

大家好,我有上面的数据框,我想计算每个活动的活动阶段“开始”和“讨论”之间的天数,然后计算平均值。

由于数据质量的原因,活动阶段并不一致。所以,对于没有“开始”和“讨论”两个阶段的活动,我想设置为 0。

我创建了数据的数据透视表视图,并按降序对活动日期进行了排序……但我不知道下一步该怎么做。

在此先感谢您的帮助。

标签: pythonpandasdataframepandas-groupby

解决方案


df['Campaign Date'] =  pd.to_datetime(df['Campaign Date'],format='%m/%d/%Y')
compare= {}
for ids,gp in df.groupby('Campaign ID'):
    try:
        compare[ids]= gp.loc[gp['Campaign stage']=='Discussing']['Campaign Date'].iloc[0] - gp.loc[gp['Campaign stage']=='Starting']['Campaign Date'].iloc[0]
    except:
        compare[ids] =0

df['new_col'] = df['Campaign ID'].apply(lambda x:compare[x])

推荐阅读