首页 > 解决方案 > 确定每组的连续日期期间

问题描述

想知道如何通过每个组的周期计数器重置来识别连续的日期周期。

这是我的尝试,它在整个 DataFrame 中执行,但无法按组锻炼如何执行此操作。

import pandas as pd

data = {
    "peoples": ["jimbob","jimbob","jimbob", "jimbob","jimbob","jimbob", "sonnyjim","sonnyjim","sonnyjim","sonnyjim"],
    "dates": ["2020-11-01","2020-11-02","2020-11-03","2020-11-06","2020-11-09","2020-11-10", "2020-11-12","2020-11-13","2020-11-20","2020-11-22"]
}

df = pd.DataFrame(data)
df["dates"] = pd.to_datetime(df["dates"])

df["period"] = df["dates"].diff().dt.days.ne(1).cumsum()

print(df)

我希望能够做这样的事情:

    peoples      dates  period
0    jimbob 2020-11-01       1
1    jimbob 2020-11-02       1
2    jimbob 2020-11-03       1
3    jimbob 2020-11-06       2
4    jimbob 2020-11-09       3
5    jimbob 2020-11-10       3
6  sonnyjim 2020-11-12       1
7  sonnyjim 2020-11-13       1
8  sonnyjim 2020-11-20       2
9  sonnyjim 2020-11-22       3

标签: pythonpandas

解决方案


您可以groupby在数据框上peoples应用自定义lambda函数dates来计算连续的日期块:

f = lambda s: s.diff().dt.days.ne(1).cumsum()
df['period'] = df.groupby('peoples')['dates'].apply(f)

    peoples      dates  period
0    jimbob 2020-11-01       1
1    jimbob 2020-11-02       1
2    jimbob 2020-11-03       1
3    jimbob 2020-11-06       2
4    jimbob 2020-11-09       3
5    jimbob 2020-11-10       3
6  sonnyjim 2020-11-12       1
7  sonnyjim 2020-11-13       1
8  sonnyjim 2020-11-20       2
9  sonnyjim 2020-11-22       3

推荐阅读