首页 > 解决方案 > 计算一个值随时间出现的次数

问题描述

我有一个看起来像这样的数据框:

        Date        Daily Risk Score    Name
25032   2020-07-07  13.0                Buffalo County, NE, US (Corn)
25033   2020-07-08  8.0                 Buffalo County, NE, US (Corn)
25034   2020-07-09  8.0                 Buffalo County, NE, US (Corn)
25035   2020-07-10  8.0                 Buffalo County, NE, US (Corn)
25036   2020-07-11  8.0                 Buffalo County, NE, US (Corn)
25037   2020-07-12  8.0                 Buffalo County, NE, US (Corn)
25038   2020-07-13  10.0                Buffalo County, NE, US (Corn)
25039   2020-07-14  7.0                 Buffalo County, NE, US (Corn)
25040   2020-07-15  7.0                 Buffalo County, NE, US (Corn)
25041   2020-07-16  10.0                Buffalo County, NE, US (Corn)
25042   2020-07-17  13.0                Buffalo County, NE, US (Corn)
25043   2020-07-18  13.0                Buffalo County, NE, US (Corn)
25044   2020-07-19  13.0                Buffalo County, NE, US (Corn)
25045   2020-07-20  13.0                Buffalo County, NE, US (Corn)
25046   2020-07-07  8.0                 Morgan County, IL, US (Corn)
25047   2020-07-08  8.0                 Morgan County, IL, US (Corn)
25048   2020-07-09  8.0                 Morgan County, IL, US (Corn)
25049   2020-07-10  8.0                 Morgan County, IL, US (Corn)
25050   2020-07-11  8.0                 Morgan County, IL, US (Corn)
25051   2020-07-12  8.0                 Morgan County, IL, US (Corn)
25052   2020-07-13  8.0                 Morgan County, IL, US (Corn)
25053   2020-07-14  8.0                 Morgan County, IL, US (Corn)
25054   2020-07-15  8.0                 Morgan County, IL, US (Corn)
25055   2020-07-16  9.0                 Morgan County, IL, US (Corn)
25056   2020-07-17  10.0                Morgan County, IL, US (Corn)
25057   2020-07-18  10.0                Morgan County, IL, US (Corn)
25058   2020-07-19  8.0                 Morgan County, IL, US (Corn)
25059   2020-07-20  8.0                 Morgan County, IL, US (Corn)
25060   2020-07-07  9.0                 Gray County, KS, US (Corn)
25061   2020-07-08  24.0                Gray County, KS, US (Corn)
25062   2020-07-09  25.0                Gray County, KS, US (Corn)
25063   2020-07-10  13.0                Gray County, KS, US (Corn)
25064   2020-07-11  24.0                Gray County, KS, US (Corn)
25065   2020-07-12  23.0                Gray County, KS, US (Corn)
25066   2020-07-13  25.0                Gray County, KS, US (Corn)
25067   2020-07-14  25.0                Gray County, KS, US (Corn)
25068   2020-07-15  25.0                Gray County, KS, US (Corn)
25069   2020-07-16  25.0                Gray County, KS, US (Corn)
25070   2020-07-17  25.0                Gray County, KS, US (Corn)
25071   2020-07-18  24.0                Gray County, KS, US (Corn)
25072   2020-07-19  21.0                Gray County, KS, US (Corn)
25073   2020-07-20  20.0                Gray County, KS, US (Corn)

我想创建一个新的数据框,列出日期和每天的次数,其值从 0<x<9、9<x<17 和 >=17 开始。新的数据框是这样的:

Date   Low count   Mid count   High count
2020-07-07   x   y   z
2020-07-08   a   b   c
2020-07-09   d   e   f
...

我应该使用 groupby 来实现这一点吗?

标签: pythonpandasdataframedatetime

解决方案


您可以使用groupby.count日期列和pd.cut分数列上的系列来标记每个值的低、中或高。然后unstack获取每个类别的列。

df_ = (df.groupby([df['Date'], 
                   pd.cut(df['score'], bins=[0, 9, 17, np.inf], 
                          labels=['low','mid','high'])])
         ['score'].count()
         .unstack()
      )
print (df_)
score       low  mid  high
Date                      
2020-07-07    2    1     0
2020-07-08    2    0     1
2020-07-09    2    0     1
2020-07-10    2    1     0
2020-07-11    2    0     1
2020-07-12    2    0     1
2020-07-13    1    1     1
2020-07-14    2    0     1
2020-07-15    2    0     1
2020-07-16    1    1     1
2020-07-17    0    2     1
2020-07-18    0    2     1
2020-07-19    1    1     1
2020-07-20    1    1     1

推荐阅读