首页 > 解决方案 > 如何将函数应用于时间戳列表以创建熊猫系列?

问题描述

好的,代码的工作部分:我有一个给定时间戳的函数,一个周期(分钟,小时,月......)将周期持续时间作为时间增量返回。基本上,对于分钟、小时、天,它直接调用 pandas Timedelta 函数。对于月份,它有点“聪明”,因为它检查时间戳在哪个月份,并返回已识别月份的天数。

import pandas as pd

def as_timedelta(ref_ts: pd.Timestamp = None):
    """
    Return the duration of a time period.
    For a month, obtaining its duration requires a reference timestamp to identify
    how many days have to be accounted for in the month.
    """

    # An input timestamp has to be given.
    # It is assumed given timestamp is at beginning of time period for which a time delta is requested.
    # Because of a possible timezone, the timestamp is max 12 hours before or after
    # beginning of month in UTC.
    # To assume the current month, we check what is the closest month beginning
    # As an example, if 31st of January, 6:00 PM is reference timestamp, duration is given for month of February

    # Get month starts
    current_month = pd.Timestamp(year=ref_ts.year, month=ref_ts.month, day=1)
    next_month = current_month + pd.DateOffset(months=1)
    nex_next_month = current_month + pd.DateOffset(months=2)
    # Get month of interest
    dist_to_next = next_month - ref_ts
    dist_to_prev = ref_ts - current_month
    # Return timedelta corresponding as the duration between current month and begining of next month
    td_13 = pd.Timedelta(13, 'h')
    if dist_to_next < td_13:
        return nex_next_month - next_month
    elif dist_to_prev < td_13:
        return next_month - current_month

给定一个时间戳列表,我想将此函数应用于每个时间戳。但是尝试使用以下代码行,但我得到了一个 AttributeError。为了说明现在的麻烦,我举个例子:

ts_list_1M = [
          "Thu Feb 01 2019 00:00:00 GMT+0100",
          "Thu Mar 01 2019 00:00:00 GMT+0100",
          "Sun Apr 01 2019 00:00:00 GMT+0200"]
op_list_1M = [7134.0, 7134.34, 7135.03]
GC_1M = pd.DataFrame(list(zip(ts_list_1M, op_list_1M)), columns =['date', 'open'])
GC_1M['date'] = pd.to_datetime(GC_1M['date'], utc=True)
GC_1M.rename(columns={'date': 'Timestamp'}, inplace=True)
GC_1M.set_index('Timestamp', inplace = True, verify_integrity = True)

著名的代码行:

GC_1M.reset_index().apply(as_timedelta,axis=1).values

我得到的错误信息:

File "<ipython-input-49-ff9556f2ec44>", line 18, in as_timedelta
current_month = pd.Timestamp(year=ref_ts.year, month=ref_ts.month, day=1)

File "C:\Users\pierre.juillard\Documents\Programs\Anaconda\lib\site-packages\pandas\core\generic.py", line 5179, in __getattr__
return object.__getattribute__(self, name)

AttributeError: ("'Series' object has no attribute 'year'", 'occurred at index 0')

当我在单个值上测试函数时,它可以工作,但是当像这样应用它时,它不会。请,关于如何实现这一目标的任何建议?

我提前感谢您的帮助!最好的,

标签: pythonpandastimestamp

解决方案


因此,当您只想将函数应用于“日期”系列时,您可以执行以下操作:

GC_1M['date'].apply(as_timedelta)

但是,这似乎不起作用,因为在您的示例'date'中不是 datetime 对象,因此您需要先转换它(您也可以在创建时执行此操作):

GC_1M['date'] = pd.to_datetime(GC_1M['date'])

最后,您的as_timedelta函数无法处理时区感知输入,我在需要修复的行中添加了注释:

def as_timedelta(ref_ts: pd.Timestamp = None):
    """
    Return the duration of a time period.
    For a month, obtaining its duration requires a reference timestamp to identify
    how many days have to be accounted for in the month.
    """

    # An input timestamp has to be given.
    # It is assumed given timestamp is at beginning of time period for which a time delta is requested.
    # Because of a possible timezone, the timestamp is max 12 hours before or after
    # beginning of month in UTC.
    # To assume the current month, we check what is the closest month beginning
    # As an example, if 31st of January, 6:00 PM is reference timestamp, duration is given for month of February

    # Get month starts
    current_month = pd.Timestamp(year=ref_ts.year, month=ref_ts.month, day=1, tzinfo=ref_ts.tzinfo)  # Make current_month timezone aware
    next_month = current_month + pd.DateOffset(months=1)
    nex_next_month = current_month + pd.DateOffset(months=2)
    # Get month of interest
    dist_to_next = next_month - ref_ts
    dist_to_prev = ref_ts - current_month
    # Return timedelta corresponding as the duration between current month and begining of next month
    td_13 = pd.Timedelta(13, 'h')
    if dist_to_next < td_13:
        return nex_next_month - next_month
    elif dist_to_prev < td_13:
        return next_month - current_month

推荐阅读