首页 > 解决方案 > 在指数中插入缺失的季度收益日期

问题描述

我有这个df:

            revenue   pct_yoy   pct_qoq
2020-06-30   99.721  0.479013  0.092833
2020-03-31   91.250  0.478283  0.087216
2019-12-31   83.930  0.676253  0.135094
2019-09-30   73.941       NaN  0.096657
2019-06-30   67.424       NaN  0.092293
2019-03-31   61.727       NaN  0.232814
2018-09-30   50.070       NaN       NaN

但是,如果您使用 来查看最后一个索引值,则在将索引视为连续的季度时间序列时2018,我似乎遗漏了。2018-12-31指数直接跳到2018-9-30.

如何确保任何缺少的季度日期都插入nan了各自列的值?

我不太确定如何解决这个问题。

标签: pythonpandastime-series

解决方案


您需要生成一个包含缺失日期的您自己的季度日期列表。然后您可以使用.reindex将您的数据框重新对齐到这个新的日期列表。

# Get the oldest and newest dates which will be the bounds
#  for our new Index
first_date = df.index.min()
last_date = df.index.max()

# Generate dates for every 3 months (3M) from first_date up to last_date
quarterly = pd.date_range(first_date, last_date, freq="3M")

# realign our dataframe using our new quarterly date index
#  this will fill NaN for dates that did not exist in the
#  original index
out = df.reindex(quarterly)

# if you want to order this from most recent date to least recent date 
#  do: out.sort_index(ascending=False)
print(out)
            revenue   pct_yoy   pct_qoq
2018-09-30   50.070       NaN       NaN
2018-12-31      NaN       NaN       NaN
2019-03-31   61.727       NaN  0.232814
2019-06-30   67.424       NaN  0.092293
2019-09-30   73.941       NaN  0.096657
2019-12-31   83.930  0.676253  0.135094
2020-03-31   91.250  0.478283  0.087216
2020-06-30   99.721  0.479013  0.092833

推荐阅读