首页 > 解决方案 > Pandas DataFrame - 将日期列标题与时间行结合起来

问题描述

read_excel我有以下数据通过该方法进入数据框:

                 Time  ...  2020-04-05 00:00:00
0 1900-01-01 00:00:00  ...                    4
1 1900-01-01 00:05:00  ...                    1
2 1900-01-01 00:10:00  ...                    1

我想结合列标题日期和行时间,所以它看起来更像:

                 Time  ...   value
0 2020-04-05 00:00:00  ...       4
1 2020-04-05 00:05:00  ...       1
2 2020-04-05 00:10:00  ...       1

我已经尝试了这个问题这个问题的答案,但他们正在做与我相反的事情(带有日期行的时间列),我认为我在某处为我的问题调整代码。基于上面的 Q1,我通过交换 timedelta 和 todates 行尝试了以下操作,因为列是我的日期,行是我的时间:

data.Time = pd.to_timedelta(data.Time.astype(str) + ':00', unit='h')
data = data.set_index('Time')
data.columns = pd.to_datetime(data.Time)
data = data.stack()
data.index = data.index.get_level_values(0) + data.index.get_level_values(1)
data = data.reset_index()
data.columns = ['date', 'val']

我在第一行收到一个错误, ValueError: unit must not be specified if the input contains a str因为我确实指定了一个单位类型,这让我感到困惑。我觉得这就是答案,我很接近,我只是错过了一些东西,我无法弄清楚 - 如何将我的日期列与我的时间行结合起来?

正在使用的数据类型:时间 = datetime64[ns]、2019-12-02 00:00:00(等)= int64

编辑:误读错误并认为它说该单元丢失了。我删除了该单元,但收到了一个替代错误ValueError: only leading negative signs are allowed

标签: pythonpandasdatedatetime

解决方案


我认为在您的解决方案中很接近,只需要重新评估转换为日期时间的列名称并unit='h'从将日期时间to_timedelta转换为HH:MM:SS字符串中删除:

np.random.seed(102)
c = ['Time', '2019-12-02 00:00:00', '2019-12-03 00:00:00', 
             '2019-12-04 00:00:00', '2019-12-05 00:00:00']
t = pd.to_datetime(['1900-01-01 00:00:00', '1900-01-01 00:05:00', '1900-01-01 00:10:00'])

data=pd.DataFrame(np.random.randint(10, size=(len(t), len(c))), columns=c)
data['Time'] = t

print (data)
                 Time  2019-12-02 00:00:00  2019-12-03 00:00:00  \
0 1900-01-01 00:00:00                    3                    2   
1 1900-01-01 00:05:00                    8                    8   
2 1900-01-01 00:10:00                    7                    0   

   2019-12-04 00:00:00  2019-12-05 00:00:00  
0                    2                    2  
1                    9                    7  
2                    6                    2  

print (data.columns)
Index(['Time', '2019-12-02 00:00:00', '2019-12-03 00:00:00',
       '2019-12-04 00:00:00', '2019-12-05 00:00:00'],
      dtype='object')

print (data['Time'])
0   1900-01-01 00:00:00
1   1900-01-01 00:05:00
2   1900-01-01 00:10:00
Name: Time, dtype: datetime64[ns]

data.Time = pd.to_timedelta(data.Time.dt.strftime('%H:%M:%S'))

data = data.set_index('Time')
#convert data.columns to datetimes and assign back
data.columns = pd.to_datetime(data.columns)
data = data.stack()
data.index = data.index.get_level_values(0) + data.index.get_level_values(1)
data = data.sort_index().reset_index()
data.columns = ['date', 'val']

print (data)
                  date  val
0  2019-12-02 00:00:00    3
1  2019-12-02 00:05:00    8
2  2019-12-02 00:10:00    7
3  2019-12-03 00:00:00    2
4  2019-12-03 00:05:00    8
5  2019-12-03 00:10:00    0
6  2019-12-04 00:00:00    2
7  2019-12-04 00:05:00    9
8  2019-12-04 00:10:00    6
9  2019-12-05 00:00:00    2
10 2019-12-05 00:05:00    7
11 2019-12-05 00:10:00    2

或者:

df = data.melt('Time', var_name='Date', value_name='val')
df['Date'] = (pd.to_datetime(df['Date']) +  
                  pd.to_timedelta(df.pop('Time').dt.strftime('%H:%M:%S')))
df = df.sort_values('Date', ignore_index=True)
print (df)
                  Date  val
0  2019-12-02 00:00:00    3
1  2019-12-02 00:05:00    8
2  2019-12-02 00:10:00    7
3  2019-12-03 00:00:00    2
4  2019-12-03 00:05:00    8
5  2019-12-03 00:10:00    0
6  2019-12-04 00:00:00    2
7  2019-12-04 00:05:00    9
8  2019-12-04 00:10:00    6
9  2019-12-05 00:00:00    2
10 2019-12-05 00:05:00    7
11 2019-12-05 00:10:00    2
    

推荐阅读