首页 > 解决方案 > 如何计算季度差异并在python中添加缺失的季度与计数

问题描述

我有一个像这样的数据框我必须丢失季度值并在它们之间进行计数与季度缺失计数相同并填充数据框是

year    Data    Id       number
    2019Q4   57170   A     1
    2019Q3   55150   A     1
    2019Q2   51109   A     1
    2019Q1   51109   A     1
    2018Q1   57170   B     5
    2018Q4   55150   B     5
    2017Q4   51109   C     7
    2017Q2   51109   C     7
    2017Q1   51109   C     7 

Id 开始年年末计数

  B  2018Q2    2018Q3        2  5
  B  2017Q3    2018Q3        1  5

如何使用 python panda 实现这一点

标签: pythonpandas

解决方案


利用:

#convert datetimes to quarter period
df['year'] = pd.to_datetime(df['year']).dt.to_period('Q')
#resample by start of months with asfreq
df1 = (df.set_index('year')
         .groupby('Id')['Id']
         .resample('Q')
         .asfreq()
         .rename('val')
         .reset_index())
print (df1)
   Id   year  val
0   A 2019Q1    A
1   A 2019Q2    A
2   A 2019Q3    A
3   A 2019Q4    A
4   B 2018Q1    B
5   B 2018Q2  NaN
6   B 2018Q3  NaN
7   B 2018Q4    B
8   C 2017Q1    C
9   C 2017Q2    C
10  C 2017Q3  NaN
11  C 2017Q4    C

m = df1['val'].notnull().rename('g')
#create index by cumulative sum for unique groups for consecutive NaNs
df1.index = m.cumsum()

#filter only NaNs row and aggregate first, last and count.
df2 = (df1[~m.values].groupby(['Id', 'g'])['year']
                     .agg(['first','last','size'])
                     .reset_index(level=1, drop=True)
                     .reset_index())
print (df2)
  Id  first   last  size
0  B 2018Q2 2018Q3     2
1  C 2017Q3 2017Q3     1

编辑:

对于具有相同值的新列,将其添加到groupby

#convert datetimes to quarter period
df['year'] = pd.to_datetime(df['year']).dt.to_period('Q')
#resample by start of months with asfreq
df1 = (df.set_index('year')
         .groupby(['Id','number'])['Id'] <- added number columns
         .resample('Q')
         .asfreq()
         .rename('val')
         .reset_index())
print (df1)
   Id  number   year  val
0   A       1 2019Q1    A
1   A       1 2019Q2    A
2   A       1 2019Q3    A
3   A       1 2019Q4    A
4   B       5 2018Q1    B
5   B       5 2018Q2  NaN
6   B       5 2018Q3  NaN
7   B       5 2018Q4    B
8   C       7 2017Q1    C
9   C       7 2017Q2    C
10  C       7 2017Q3  NaN
11  C       7 2017Q4    C

m = df1['val'].notnull().rename('g')
#create index by cumulative sum for unique groups for consecutive NaNs
df1.index = m.cumsum()

#filter only NaNs row and aggregate first, last and count.
df2 = (df1[~m.values].groupby(['Id', 'number', 'g'])['year']
                     .agg(['first','last','size'])  <- added number columns
                     .reset_index(level=1, drop=True)
                     .reset_index())
print (df2)
  Id  g  first   last  size
0  B  5 2018Q2 2018Q3     2
1  C  8 2017Q3 2017Q3     1

推荐阅读