首页 > 解决方案 > 如何将 dtype='datetime64[ns]' 转换为浮点数?

问题描述

我正在练习线性回归,在这里我将日期作为输入 x 并期望输出 y(float)

x = df[('Date')].values
x = x.reshape(-1, 1)
y= df[('MeanTemp')].values #MeanTemp column has float values
y = y.reshape(-1, 1)

当我打印 x 时,输出是:

array([['1942-07-01T00:00:00.000000000'],
       ['1942-07-02T00:00:00.000000000'],
       ['1942-07-03T00:00:00.000000000'],
       ['1942-07-04T00:00:00.000000000'],
       ['1942-07-05T00:00:00.000000000'],
       ['1942-07-06T00:00:00.000000000'],
       ['1942-07-07T00:00:00.000000000'],
       ['1942-07-08T00:00:00.000000000'],
       ['1942-07-09T00:00:00.000000000'],
       ['1942-07-10T00:00:00.000000000']], dtype='datetime64[ns]')

现在,当我使用线性回归时

linlin = LinearRegression()
linlin.fit(x, y)

它没有给出任何错误,但是当我写

linlin.predict(x)


TypeError: The DTypes <class 'numpy.dtype[float64]'> and <class 'numpy.dtype[datetime64]'> do not have a common DType. For example they cannot be stored in a single array unless the dtype is `object`.

上面的 TypeError 弹出。如何将此数据类型转换为浮点数以便预测函数正常工作?

标签: pythonpandasscikit-learntypeerror

解决方案


您可以使用 fromnumpytimedelta日期相比的日期(以天为单位),min如下所示:

>>> import numpy as np

>>> df['date_delta'] = (df['Date'] - df['Date'].min())  / np.timedelta64(1,'D')
>>> x = df['date_delta'].values

或者您可以使用以下函数以浮点表示形式转换日期:

>>> import numpy as np
>>> import pandas as pd

>>> def dt64_to_float(dt64):
...     year = dt64.astype('M8[Y]')
...     days = (dt64 - year).astype('timedelta64[D]')
...     year_next = year + np.timedelta64(1, 'Y')
...     days_of_year = (year_next.astype('M8[D]') - year.astype('M8[D]')).astype('timedelta64[D]')
...     dt_float = 1970 + year.astype(float) + days / (days_of_year)
...     return dt_float

>>> df['date_float'] = dt64_to_float(df['Date'].to_numpy())
>>> x = df['date_float'].values

推荐阅读