python - 如何在python中找到给定日期每周的总播放时间?
问题描述
我有一个看起来像下面的数据框
k={'user_id':[1,1,1,1,1,2,2,2,3,3,3,3,3,4,4,4,5,5],
'created':[ '2/09/2021','2/10/2021','2/16/2021','2/17/2021','3/09/2021','3/10/2021','3/18/2021','3/19/2021',
'2/19/2021','2/20/2021','2/26/2021','2/27/2021','3/09/2021','2/10/2021','2/18/2021','3/19/2021',
'3/24/2021','3/30/2021',],
'stop_time':[11,12,13,14,15,25,26,27,6,7,8,9,10,11,12,13,25,26],
'play_time':[10,11,12,13,14,24,25,26,5,6,7,8,9,10,11,13,24,25]}
df=pd.DataFrame(data=k)
df['created']=pd.to_datetime(df['created'], format='%m/%d/%Y')
df['total_play_time'] = df['stop_time'] - df['play_time']
现在我们需要使用每个 user_id 的第一个日期作为第一周的开始日期,例如我们需要选择 '2/9/2021' 是 user_id 1 的第一周开始日期和 '3/09/2021'作为 user_id 2 的第一周开始日期。
我们需要对 user_id 每周的总游戏时间求和,它继续给每个总和,直到当前日期(例如,如果运行报告到今天,它必须给出每周总和直到今天)并给出如下结果
ID week1 week2 week3 week4 week5 week6 week7 week8 week9 week10 week11 week12
1 3 2 0 0 0 0 0 0 0 0 0 0
2 1 2 0 0 0 0 0
解决方案
# Get a list of unique id's
user_ids = df["user_id"].unique()
# Get the start date of each user
start_dates = [min(df[df["user_id"]==usr]["created"]) for usr in user_ids]
# We will subtract the start date to have a common baseline for all users
df["time_since_start"] = None
for i, usr in enumerate(user_ids):
df.loc[df["user_id"]==usr,"time_since_start"] = df.loc[df["user_id"]==usr,"created"] - start_dates[i]
# we got a Timedelta object, but its more useful as a float
df['t'] = [x.value for x in df["time_since_start"]]
# get the maximum time any user has ever ..played? to make our bins
max_time = df["time_since_start"].max()
# convert it from microseconds to weeks, rounding up
max_weeks = int(np.ceil(max_time.value/8.64e+13/7))
# make the bins and add corresponding readable labels
bins = [pd.Timedelta(weeks = wk).value for wk in range(max_weeks+1)]
labels = ["week " + str(wk+1) for wk in range(max_weeks)]
# bin the data and aggregate the result
df["bin"] = pd.cut(df['t'], bins, labels = labels)
df.groupby(['user_id','bin'])['total_play_time'].sum()
user_id bin
1 week 1 2
week 2 1
week 3 0
week 4 1
week 5 0
week 6 0
2 week 1 0
week 2 2
week 3 0
week 4 0
week 5 0
week 6 0
3 week 1 2
week 2 1
week 3 1
week 4 0
week 5 0
week 6 0
4 week 1 0
week 2 1
week 3 0
week 4 0
week 5 0
week 6 0
5 week 1 1
week 2 0
week 3 0
week 4 0
week 5 0
week 6 0
Name: total_play_time, dtype: int64
然后,如果您确实需要,您可以将数据框重塑为宽格式。
推荐阅读
- java - 如何在 Java 中发出 PATCH 请求(使用 SSL)?
- javascript - 使用 .filter 过滤两个数组
- html - 如何在 Bootstrap 中居中对齐多个列?
- python-3.x - pgAdmin4 在 OpenSUSE Leap 15.2 上不显示服务器连接
- java - 使用Hashmap记忆递归解决方案导致Apple Division CSES问题超出时间限制
- python - 在 fipy 中解决耦合偏微分方程的最佳方法
- git - 永久链接到 Github 私人仓库中的原始文件
- reporting-services - 如何手动测试 SSRS 报告
- ruby-on-rails - 我想从我的 rails rails 应用程序中的某些页面中删除页眉或页脚
- angular - 如何使用 tsconfig 文件从构建中动态排除模块