首页 > 解决方案 > 如何在 Python 中基于 Datetime 合并 DataFrame

问题描述

我编写了以下代码,它创建了两个数据框nqcmnt.
nq包含UserIdBadge Attainment 的对应时间date
cmnt包含OwnerUserId和用户发表评论的时间CreationDate
我想统计每个用户在获得徽章 1 周前后的所有评论,并将其存储在数据框中,如图所示。

以下代码执行相同的操作,但对部分数据产生错误,而对数据的另一部分工作正常。请为我提供另一种方法来完成此任务。

q

 UserId |   date 
     1      2009-10-17 17:38:32.590
     2      2009-10-19 00:37:23.067
     3      2009-10-20 08:37:14.143
     4      2009-10-21 18:07:51.247
     5      2009-10-22 21:25:24.483

厘米

OwnerUserId | CreationDate
1             2009-10-16 17:38:32.590
1             2009-10-18 17:38:32.590
2             2009-10-18 00:37:23.067
2             2009-10-17 00:37:23.067
2             2009-10-20 00:37:23.067
3             2009-10-19 08:37:14.143
4             2009-10-20 18:07:51.247
5             2009-10-21 21:25:24.483

代码

t = pd.merge(nq, cmnt, left_on="UserId", right_on = "OwnerUserId")
t["days_diff"] = (t["CreationDate"] - t["date"]).dt.days
t["count"] = t.groupby(["UserId", "days_diff"]).OwnerUserId.transform("count")

all_days = pd.DataFrame(itertools.product(t.UserId.unique(), range(-7, 8)), )
all_days.columns = ["UserId", "day"]

t = pd.merge(t, all_days, left_on=["UserId", "days_diff"], right_on=["UserId", "day"], how = "right")
t = pd.pivot_table(t, index="UserId", columns="day", values="count", dropna=False)

res = pd.merge(nq, t, left_on="UserId", right_index=True)

print(res)

预期产出

UserId     |   date                 |-7|-6|-5|-4|-3|-2|-1|0 |1 |2 |3 |4 |5 |6 |7
     1      2009-10-17 17:38:32.590 |0 |0 |0 |0 |0 |0 |1 |0 |1 |0 |0 |0 |0 |0 |0  
     2      2009-10-19 00:37:23.067 |0 |0 |0 |0 |0 |1 |1 |0 |1 |0 |0 |0 |0 |0 |0    
     3      2009-10-20 08:37:14.143 |0 |0 |0 |0 |0 |0 |1 |0 |0 |0 |0 |0 |0 |0 |0 
     4      2009-10-21 18:07:51.247 |0 |0 |0 |0 |0 |0 |1 |0 |0 |0 |0 |0 |0 |0 |0 
     5      2009-10-22 21:25:24.483 |0 |0 |0 |0 |0 |0 |1 |0 |0 |0 |0 |0 |0 |0 |0 

这里的列-1表示获得徽章前 1 天发表的1评论,表示获得徽章后 1 天发表的评论,依此类推。

错误

ValueError: Length mismatch: Expected axis has 0 elements, new values have 2 elements

注意 错误是在这行代码引起的:
all_days.columns = ["UserId", "day"]

标签: pythonpandasdataframetime-seriespython-datetime

解决方案


推荐阅读