python - 如何计算这个df Python Pandas的阶段之间的天数?
问题描述
df = pd.DataFrame({'Campaign ID':[48464,48464,48464,48464,26380,26380,22676,39529,39529,46029,46029,46029,17030,46724,46724,39379,39379,39379],
'Campaign stage':["Lost","Developing","Discussing","Starting","Discussing", "Starting","Developing", "Discussing","Starting","Developing", "Discussing","Starting","Developing", "Developing","Discussing","Lost", "Developing","Discussing"],
'Stage Number':[-1, 3, 2, 1, 2, 1, 3, 2, 1, 3, 2, 1, 3, 3, 2, -1, 3, 2],
'Campaign Date':["2/8/2019","1/9/2019","1/3/2019","3/3/2018","2/14/2019","12/5/2018","7/25/2018","6/8/2018","3/4/2018","12/8/2018","9/9/2018","5/31/2018","6/7/2018","3/27/2018","1/6/2018","2/15/2019","12/15/2018","9/4/2018"]})
pvt = pd.pivot_table(df,values=['Campaign stage'],index=['Campaign ID','Campaign stage','Stage Number','Campaign Date'],aggfunc='count')
pvt.sort_values(['Campaign ID','Campaign Date'],ascending=[True,False])
大家好,我有上面的数据框,我想计算每个活动的活动阶段“开始”和“讨论”之间的天数,然后计算平均值。
由于数据质量的原因,活动阶段并不一致。所以,对于没有“开始”和“讨论”两个阶段的活动,我想设置为 0。
我创建了数据的数据透视表视图,并按降序对活动日期进行了排序……但我不知道下一步该怎么做。
在此先感谢您的帮助。
解决方案
df['Campaign Date'] = pd.to_datetime(df['Campaign Date'],format='%m/%d/%Y')
compare= {}
for ids,gp in df.groupby('Campaign ID'):
try:
compare[ids]= gp.loc[gp['Campaign stage']=='Discussing']['Campaign Date'].iloc[0] - gp.loc[gp['Campaign stage']=='Starting']['Campaign Date'].iloc[0]
except:
compare[ids] =0
df['new_col'] = df['Campaign ID'].apply(lambda x:compare[x])
推荐阅读
- vb.net - OleDbAdapter 错误,在封闭块中隐藏变量错误
- python - pandas:根据另一列中的值计算每一行的jaccard相似度
- python - 未找到 AWS 代码构建 Python 库错误 GLIBC_2.29
- julia - 无法在 Julia 中初始化大矩阵
- kubernetes - 在 Prometheus 社区 Helm 图表中禁用默认仪表板
- json - 在颤振中从 JSON 中获取数据
- curl - 是否可以使用 curl 解析“不安全的连接”消息?
- powerbi - 默认选择的筛选器 - Power BI
- flutter - 为什么颤振分析与飞镖分析不同?
- karate - 当键有一个点时,如何从空手道中的 JSON 中删除键和数据元素