首页 > 解决方案 > 将每日数据重采样为每周数据

问题描述

关于这个主题有很多帖子。我浏览了它们,但找不到我的问题的答案:

我正在研究熊猫时间序列数据框。DataFrame 数据在每日时间范围内,我通过 pandas 库 resample() 函数将其聚合到每周时间范围,如下所示。

daily_df = #daily time series dataframe

def aggregate(daily_df, frequency): 
    weekly_df = daily_df.resample(frequency, on='date').agg({'open':'first','high':'max', 'low':'min','close':'last','volume':'sum'})
    df.reset_index(inplace=True)
    return weekly_df

weekly_df = aggregate(daily_df, 'W-Fri')

我遇到的问题是,某周的时间序列数据仅包含周一到周四的数据,但我不知道如何告诉 resample() 函数进行检查,如果是,则结束一周在星期四而不是星期五;“周五”。

标签: pythonpandaspandas-resample

解决方案


由于重采样函数没有该功能,我们可以通过添加天数标志并计算它来确定一周内重采样的天数。

import yfinance as yf
daily_df = yf.download("AAPL", start="2020-11-01", end="2020-12-31")

def aggregate(daily_df, frequency):
    daily_df.reset_index(inplace=True)
    daily_df['days'] = 1
    weekly_df = daily_df.resample(frequency, on='Date').agg({'Open':'first','High':'max', 'Low':'min','Close':'last','Volume':'sum','days':'count'})
    return weekly_df

weekly_df = aggregate(daily_df, 'W-Fri')
weekly_df

          Open  High     Low     Close    Volume    days
Date                        
2020-11-06  109.110001  119.620003  107.320000  118.690002  609571800   5
2020-11-13  120.500000  121.989998  114.129997  119.260002  589577900   5
2020-11-20  118.919998  120.989998  116.809998  117.339996  389493400   5
2020-11-27  117.180000  117.620003  112.589996  116.589996  365024000   4
2020-12-04  116.970001  123.779999  116.809998  122.250000  543809200   5
2020-12-11  122.309998  125.949997  120.150002  122.410004  452278700   5
2020-12-18  122.599998  129.580002  121.540001  126.660004  621866700   5
2020-12-25  125.019997  134.410004  123.449997  131.970001  433310200   4
2021-01-01  133.990005  138.789993  133.399994  133.720001  341985600   3

推荐阅读