首页 > 解决方案 > 无法在熊猫中找到日期

问题描述

我有一个这种形式的数据集:

    company_name    date
0   global_infotech 2019-06-15
1   global_infotech 2020-03-22
2   global_infotech 2020-08-30
3   global_infotech 2018-06-19
4   global_infotech 2018-06-15
5   global_infotech 2018-02-15
6   global_infotech 2018-11-22
7   global_infotech 2019-01-15
8   global_infotech 2018-12-15
9   global_infotech 2019-06-15
10  global_infotech 2018-12-19
11  global_infotech 2019-12-31
12  global_infotech 2019-02-18
13  global_infotech 2018-06-16
14  global_infotech 2019-02-10
15  global_infotech 2019-03-15
16  Qualcom         2019-07-11
17  Qualcom         2018-01-11
18  Qualcom         2018-05-29
19  Qualcom         2018-10-06
20  Qualcom         2018-11-11
21  Qualcom         2019-08-17
22  Qualcom         2019-02-22
23  Qualcom         2019-10-16
24  Qualcom         2018-06-22
25  Qualcom         2018-06-14
26  Qualcom         2018-06-16
27  Syscin          2018-02-10
28  Syscin          2019-02-16
29  Syscin          2018-04-12
30  Syscin          2018-08-22
31  Syscin          2018-09-16
32  Syscin          2019-04-20
33  Syscin          2018-02-28
34  Syscin          2018-01-19

考虑到今天是 2020 年 1 月 1 日,我想编写一个代码来查找每个公司名称在过去 3 个月内出现的次数。例如,假设从 2019 年 10 月 1 日到 2020 年 1 月 1 日,gobal_infotech 的名称出现了 5 次,那么 5 应该出现在每个 global_infotech 值的前面,例如:

   company_name    date         appearance_count_last_3_months
0   global_infotech 2019-06-15       5
1   global_infotech 2020-03-22       5
2   global_infotech 2020-08-30       5
3   global_infotech 2018-06-19       5
4   global_infotech 2018-06-15       5
5   global_infotech 2018-02-15       5
6   global_infotech 2018-11-22       5
7   global_infotech 2019-01-15       5
8   global_infotech 2018-12-15       5
9   global_infotech 2019-06-15       5
10  global_infotech 2018-12-19       5
11  global_infotech 2019-12-31       5
12  global_infotech 2019-02-18       5
13  global_infotech 2018-06-16       5
14  global_infotech 2019-02-10       5
15  global_infotech 2019-03-15       5

标签: pandas

解决方案


国际大学联盟:

您可以创建自定义函数:

def getcount(company,month=3,df=df):
    df=df.copy()
    df['date']=pd.to_datetime(df['date'],format='%Y-%m-%d',errors='coerce')
    df=df[df['company_name'].eq(company)]
    val=df.groupby(pd.Grouper(key='date',freq=str(month)+'m')).count().max().get(0)
    df['appearance_count_last_3_months']=val
    return df

getcount('global_infotech')
#OR
getcount('global_infotech',3)

更新:

因为你有 92​​ 家不同的公司,所以你可以使用 for 循环:

lst=[]
for x in df['company_name'].unique():
    lst.append(getcount(x))
out=pd.concat(lst)

如果您打印out,那么您将获得所需的输出


推荐阅读