首页 > 解决方案 > 按时间箱/间隔计算并发用户会话

问题描述

我的数据由具有开始和结束时间戳的会话组成。我的任务是按公司和应用程序版本计算每个时间间隔内“活动”的会话数。我从 2 分钟的间隔开始。因此,如果一家公司的会话从下午 2:00 持续到下午 2:07……该公司将计入 4 个箱/间隔(2:00、2:02、2:04、2:06)中的每一个。我将如何用熊猫解决这个问题?

这是我的样本数据:

data = pd.DataFrame({
    'Customer': ["CompanyA", "CompanyB", "CompanyA", "CompanyC", "CompanyB", "CompanyB", "CompanyC", "CompanyA", "CompanyC", "CompanyB"],   
    'AppVer': ["zi1","zi3","zi2","zi2","zi1","zi3","zi3","zi2","zi1","zi1"], 
    'start_timestamp': pd.date_range(start='2020-06-01 22:19:00', freq='3min', periods=10),
    'end_timestamp': pd.date_range(start='2020-06-01 22:23:00', freq='4min', periods=10)
}).sample(frac=1)#randomize rows

print(data)

我能够根据 2 分钟的间隔数数

session_starts = (x - pd.Timedelta(seconds=x.second) for x in data['start_timestamp'])
session_ends = (x - pd.Timedelta(seconds=x.second) for x in data['end_timestamp'])
sesszip = zip(session_starts,session_ends)
ranges_gen = (pd.date_range(x.round("2T"),y.round("2T"),freq='2T') for x,y in sesszip)
ranges_ser = pd.Series(chain.from_iterable(ranges_gen),name="time")
output_df = ranges_ser.value_counts(sort=False)\
.sort_index()\
.to_frame()\
.reset_index()\
.rename(columns={'index':'time' , 'time':'counts'})

output_df

我现在无法弄清楚如何按客户或应用程序版本进行分组。任何关于不同方法的帮助或想法将不胜感激。

标签: pythonpandasintervals

解决方案


好的,我们可以output_df使用以下方法重新创建您的:

data['timerange'] = data.apply(lambda x: 
                              pd.date_range(x['start_timestamp'].floor('T').round('2T'), 
                                           x['end_timestamp'].floor('T').round('2T'), 
                                           freq='2T'), 
                               axis=1)

data.explode('timerange').groupby(['timerange'])['Customer'].count()

输出:

timerange
2020-06-01 22:20:00    1
2020-06-01 22:22:00    2
2020-06-01 22:24:00    3
2020-06-01 22:26:00    2
2020-06-01 22:28:00    3
2020-06-01 22:30:00    2
2020-06-01 22:32:00    3
2020-06-01 22:34:00    3
2020-06-01 22:36:00    4
2020-06-01 22:38:00    3
2020-06-01 22:40:00    4
2020-06-01 22:42:00    3
2020-06-01 22:44:00    4
2020-06-01 22:46:00    4
2020-06-01 22:48:00    4
2020-06-01 22:50:00    3
2020-06-01 22:52:00    3
2020-06-01 22:54:00    2
2020-06-01 22:56:00    2
2020-06-01 22:58:00    1
2020-06-01 23:00:00    1
Name: Customer, dtype: int64

按客户:

data.explode('timerange').groupby(['Customer','timerange'])['Customer'].count().unstack(0, fill_value=0)

输出:

Customer             CompanyA  CompanyB  CompanyC
timerange                                        
2020-06-01 22:20:00         1         0         0
2020-06-01 22:22:00         1         1         0
2020-06-01 22:24:00         2         1         0
2020-06-01 22:26:00         1         1         0
2020-06-01 22:28:00         1         1         1
2020-06-01 22:30:00         1         0         1
2020-06-01 22:32:00         1         1         1
2020-06-01 22:34:00         0         2         1
2020-06-01 22:36:00         0         2         2
2020-06-01 22:38:00         0         2         1
2020-06-01 22:40:00         1         2         1
2020-06-01 22:42:00         1         1         1
2020-06-01 22:44:00         1         1         2
2020-06-01 22:46:00         1         1         2
2020-06-01 22:48:00         1         1         2
2020-06-01 22:50:00         1         1         1
2020-06-01 22:52:00         1         1         1
2020-06-01 22:54:00         0         1         1
2020-06-01 22:56:00         0         1         1
2020-06-01 22:58:00         0         1         0
2020-06-01 23:00:00         0         1         0

通过 AppVer:

data.explode('timerange').groupby(['AppVer','timerange'])['AppVer'].count().unstack(0, fill_value=0)

输出:

AppVer               zi1  zi2  zi3
timerange                         
2020-06-01 22:20:00    1    0    0
2020-06-01 22:22:00    1    0    1
2020-06-01 22:24:00    1    1    1
2020-06-01 22:26:00    0    1    1
2020-06-01 22:28:00    0    2    1
2020-06-01 22:30:00    0    2    0
2020-06-01 22:32:00    1    2    0
2020-06-01 22:34:00    1    1    1
2020-06-01 22:36:00    1    1    2
2020-06-01 22:38:00    1    0    2
2020-06-01 22:40:00    1    1    2
2020-06-01 22:42:00    0    1    2
2020-06-01 22:44:00    1    1    2
2020-06-01 22:46:00    2    1    1
2020-06-01 22:48:00    2    1    1
2020-06-01 22:50:00    2    1    0
2020-06-01 22:52:00    2    1    0
2020-06-01 22:54:00    2    0    0
2020-06-01 22:56:00    2    0    0
2020-06-01 22:58:00    1    0    0
2020-06-01 23:00:00    1    0    0

推荐阅读