首页 > 解决方案 > 数据帧时间序列转置

问题描述

如下数据框,我想通过使用“时间”、“汽车”和“流量”对它们进行重新采样,并将它们转置为列。

在此处输入图像描述

所以我需要:

  1. 添加缺失分钟的行,从 09:00 到 10:00,每分钟
  2. 每 30 分钟重新采样一次

最终结果应如下所示: 在此处输入图像描述

后面有一些步骤,我尽了最大努力,但结果却是:

data = {'time': ["9:17",    "9:17", "9:20", "9:21", "9:22", "9:23", "9:23", "9:25", "9:26", "9:27", "9:36", "9:36"], 
'tempreture': [37.7,    37.12,  37.7,   37.11,  37.9,   37.1,   37.8,   37.2,   37.6,   37.7,   37.1,   37.3],
'cars' : [737,  22, 42, 1,  45, 13, 47, 304,    10, 192,    5,  16],
'flow': ["In","Out","In","Unknown","Out","In","Out","In","Unknown","Out","In","In"]}
df = pd.DataFrame(data)

df = df[['time', 'cars', 'flow']]

idx = pd.date_range("9:00", "10:00", freq="1min")
idx = idx.rename('time')

df.index = pd.to_datetime(df.index)
df = df.reindex(idx, fill_value=0)

df = df.pivot(columns='flow').resample('30T').sum()
df = df.reset_index()

df = df.set_index('time')
df.index = pd.to_datetime(df.index)
df = df.stack().T
df["Total"] = df.sum(axis=1)

print (df)


time 2020-10-21 09:00:00 2020-10-21 09:30:00 2020-10-21 10:00:00 Total
flow                   0                   0                   0      
cars                   0                   0                   0     0

写它的正确方法是什么?谢谢!

标签: pythonpandasdataframe

解决方案


使用DataFrame.pivot_tablewith aggregate sum,然后DataFrame.reindex使用DataFrame.resampleand last 转换为一行 DataFrame by DataFrame.stackSeries.to_frame并使用 flatten 转置MultiIndex

df = df[['time', 'cars', 'flow']]


df = df.pivot_table(index='time',columns='flow', values='cars', aggfunc='sum')
df.index = pd.to_datetime(df.index + ':00')
idx = pd.date_range("9:00", "10:00", freq="1min", name='time')

df = df.reindex(idx, fill_value=0).resample('30T').sum()
df['Total'] = df.sum(axis=1)
df.index = df.index.strftime('%H:%M')
df = df.stack().to_frame().T
df.columns = df.columns.map(lambda x: f'{x[0]} {x[1]}')
print (df)
   09:00 In  09:00 Out  09:00 Unknown  09:00 Total  09:30 In  09:30 Out  \
0    1096.0      306.0           11.0       1413.0      21.0        0.0   

   09:30 Unknown  09:30 Total  10:00 In  10:00 Out  10:00 Unknown  10:00 Total  
0            0.0         21.0       0.0        0.0            0.0          0.0  

但这里似乎reindex没有必要:

df = df.pivot_table(index='time',columns='flow', values='cars', aggfunc='sum')
df.index = pd.to_datetime(df.index + ':00')

df = df.resample('30T').sum()
df['Total'] = df.sum(axis=1)
df.index = df.index.strftime('%H:%M')
df = df.stack().to_frame().T
df.columns = df.columns.map(lambda x: f'{x[0]} {x[1]}')

推荐阅读