首页 > 解决方案 > 如何获取熊猫数据框列在给定月份的天数?

问题描述

尝试为 ML 算法编码循环特征,其中时间戳特征作为特征非常重要。

我想将 day_in_month (cyclic_df 的'day' 列)转换为循环变量,以便一个月的第一天在前一天的最后一天之后。所以 01. 二月 (01.02) 更接近 1 月 31 日 (31.01),因此 2 天之间的差异,如果你只考虑天列,是 1 而不是 30!

# Transform the cyclical features 
cyclic_df['min_sin'] = np.sin(cyclic_df.minute*(2.*np.pi/59))       # Sinus component of minute 
cyclic_df['min_cos'] = np.cos(cyclic_df.minute*(2.*np.pi/59))       # Cosinus component of minute 
cyclic_df['hr_sin'] = np.sin(cyclic_df.hour*(2.*np.pi/23))          # Sinus component of hour 
cyclic_df['hr_cos'] = np.cos(cyclic_df.hour*(2.*np.pi/23))          # Cosinus component of hour

cyclic_df['d_sin'] = np.sin(cyclic_df.day*(2.*np.pi/30))            # !!!Sinus component of day!!!! Help here
cyclic_df['d_cos'] = np.cos(cyclic_df.day*(2.*np.pi/30))            # !!!Cosinus component of day!!! Help here

cyclic_df['mnth_sin'] = np.sin((cyclic_df.month-1)*(2.*np.pi/12))   # Sinus component of minute 
cyclic_df['mnth_cos'] = np.cos((cyclic_df.month-1)*(2.*np.pi/12))   # Cosinus component of minute

问题在于我划分的那 30 个。不是每个月都有 30 天,有些月有 30、31、28 或 29 天。在cyclical_df 的每一行中,我有一列“月”、一列“年”和一列“日”。所以从理论上讲,应该有一个解决方案来读取给定月份的正确天数。如何用正确的变量替换 30(上面代码中的第 5 行和第 6 行),以便它从其他列中读取年份和月份,并替换为正确的值,而不总是 30?

PS:如果有人能告诉我,如果我在每分钟、每小时和每月的时间里做得对的话,那就太好了,也可以在上面的代码中找到。

编辑(评论后):是的,我有一个“年”列。并将这两行更改为:

cyclic_ext_df['d_cos'] = np.cos(cyclic_ext_df.day*(2.*np.pi/monthrange(cyclic_df.year, cyclic_ext_df.month)[1]))
cyclic_ext_df['d_cos'] = np.cos(cyclic_ext_df.day*(2.*np.pi/monthrange(cyclic_df.year, cyclic_ext_df.month)[1]))

我收到以下错误:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-575-532a308075e2> in <module>()
     11 #cyclic_ext_df['d_cos'] = np.cos(cyclic_ext_df.day*(2.*np.pi/30))            # Cosinus component of day
     12 
---> 13 cyclic_ext_df['d_cos'] = np.cos(cyclic_ext_df.day*(2.*np.pi/monthrange(cyclic_df.year, cyclic_ext_df.month)[1]))
     14 cyclic_ext_df['d_cos'] = np.cos(cyclic_ext_df.day*(2.*np.pi/monthrange(cyclic_df.year, cyclic_ext_df.month)[1]))
     15 

~/anaconda/lib/python3.6/calendar.py in monthrange(year, month)
    120     """Return weekday (0-6 ~ Mon-Sun) and number of days (28-31) for
    121        year, month."""
--> 122     if not 1 <= month <= 12:
    123         raise IllegalMonthError(month)
    124     day1 = weekday(year, month, 1)

~/anaconda/lib/python3.6/site-packages/pandas/core/generic.py in __nonzero__(self)
   1574         raise ValueError("The truth value of a {0} is ambiguous. "
   1575                          "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
-> 1576                          .format(self.__class__.__name__))
   1577 
   1578     __bool__ = __nonzero__

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

标签: pythonpandasfeature-extractionfeature-selectionfeature-engineering

解决方案


如果您的数据中有年份和月份,您可以使用calendar.monthrange

from calendar import monthrange

month = 2
year = 2014

_, mr = monthrange(year, month)
cyclic_df['d_cos'] = np.cos(cyclic_df.day*(2.*np.pi/mr))

推荐阅读