首页 > 解决方案 > 重新索引 Pandas 数据框

问题描述

我有一个熊猫df。我希望根据每个名称的输入重新索引日期列。

                 date  value1  value2 name
0  1992-08-27 07:30:00    28.0     NaN    A
1  1992-08-27 08:00:00    28.2    27.0    A
2  1992-08-27 09:00:00    28.8    27.5    A
3  1992-08-27 09:30:00    29.0     NaN    A
4  1992-08-27 10:30:00    29.6     NaN    A
5  1992-08-27 11:00:00    29.8    27.0    A
6  1992-08-27 11:30:00    30.0    27.0    A
7  1992-08-27 08:00:00    29.2    29.0    B
8  1992-08-27 09:30:00    30.0    37.0    B
9  1992-08-27 10:30:00    24.6    37.0    B
10 1992-08-27 11:00:00    24.8    37.0    B

我希望根据每个名称的日期列重新索引 pandas df。

这就是我正在做的事情:

import datetime
s_date = datetime.datetime(1992, 8, 27, 7)
e_date =  datetime.datetime(1992, 8, 27, 12) 
df_time = pd.date_range(start=s_date, end=e_date,
                     freq='0.50H').to_frame(index=False, name='date')
df_time.date = pd.to_datetime(df_time.date)
df = pd.merge(df, df_time, on=['date'], how='outer')  

我预期的df是这样的:

                 date  value1  value2 name
1992-08-27 07:00:00    NaN      NaN    A
1992-08-27 07:30:00    28.0     NaN    A
1992-08-27 08:00:00    28.2    27.0    A
1992-08-27 08:30:00    28.2    27.0    A
1992-08-27 09:00:00    28.8    27.5    A
1992-08-27 09:30:00    29.0     NaN    A
1992-08-27 10:00:00    29.0     NaN    A
1992-08-27 10:30:00    29.6     NaN    A
1992-08-27 11:00:00    29.8    27.0    A
1992-08-27 11:30:00    30.0    27.0    A
1992-08-27 12:00:00    30.0    27.0    A
1992-08-27 07:00:00    NaN      NaN    B
1992-08-27 07:30:00    28.0     NaN    B
1992-08-27 08:00:00    29.2    29.0    B
1992-08-27 08:30:00    28.2    27.0    B
1992-08-27 09:00:00    28.8    27.5    B
1992-08-27 09:30:00    30.0    37.0    B
1992-08-27 10:00:00    29.6    37.0    B
1992-08-27 10:30:00    24.6    37.0    B
1992-08-27 11:00:00    24.8    37.0    B
1992-08-27 11:30:00    30.0    27.0    B
1992-08-27 12:00:00    30.0    27.0    B

我究竟做错了什么?

标签: pythonpandasdatetime

解决方案


您可以使用pyjanitor的完整功能来公开缺失值:

创建一个包含完整日期时间范围的字典

new_dates = {"date" : lambda df: pd.date_range("1992-08-27 07:00:00", 
                                               "1992-08-27 12:00:00", 
                                               freq="30T")
               }

传递new_dates变量来完成

# pip install https://github.com/pyjanitor-devs/pyjanitor.git
import janitor
import pandas as pd


df.complete([new_dates], by = 'name')
 
   name                date  value1  value2
0     A 1992-08-27 07:00:00     NaN     NaN
1     A 1992-08-27 07:30:00    28.0     NaN
2     A 1992-08-27 08:00:00    28.2    27.0
3     A 1992-08-27 08:30:00     NaN     NaN
4     A 1992-08-27 09:00:00    28.8    27.5
5     A 1992-08-27 09:30:00    29.0     NaN
6     A 1992-08-27 10:00:00     NaN     NaN
7     A 1992-08-27 10:30:00    29.6     NaN
8     A 1992-08-27 11:00:00    29.8    27.0
9     A 1992-08-27 11:30:00    30.0    27.0
10    A 1992-08-27 12:00:00     NaN     NaN
11    B 1992-08-27 07:00:00     NaN     NaN
12    B 1992-08-27 07:30:00     NaN     NaN
13    B 1992-08-27 08:00:00    29.2    29.0
14    B 1992-08-27 08:30:00     NaN     NaN
15    B 1992-08-27 09:00:00     NaN     NaN
16    B 1992-08-27 09:30:00    30.0    37.0
17    B 1992-08-27 10:00:00     NaN     NaN
18    B 1992-08-27 10:30:00    24.6    37.0
19    B 1992-08-27 11:00:00    24.8    37.0
20    B 1992-08-27 11:30:00     NaN     NaN
21    B 1992-08-27 12:00:00     NaN     NaN

complete只是 Pandas 函数的抽象,使这样的过程变得简单(也有助于重复索引)。您可以忽略它并坚持仅使用 Pandas 的方法:

创建完整日期时间的索引

new_index = pd.date_range("1992-08-27 07:00:00", 
                          "1992-08-27 12:00:00", 
                          freq="30T")

new_index = new_index.rename("date")

运行 groupby,并用于apply重新索引每个组。

(df
.set_index("date")
.groupby("name")
.apply (lambda df: df.reindex(new_index))
.drop(columns="name")
.reset_index()
 )

   name                date  value1  value2
0     A 1992-08-27 07:00:00     NaN     NaN
1     A 1992-08-27 07:30:00    28.0     NaN
2     A 1992-08-27 08:00:00    28.2    27.0
3     A 1992-08-27 08:30:00     NaN     NaN
4     A 1992-08-27 09:00:00    28.8    27.5
5     A 1992-08-27 09:30:00    29.0     NaN
6     A 1992-08-27 10:00:00     NaN     NaN
7     A 1992-08-27 10:30:00    29.6     NaN
8     A 1992-08-27 11:00:00    29.8    27.0
9     A 1992-08-27 11:30:00    30.0    27.0
10    A 1992-08-27 12:00:00     NaN     NaN
11    B 1992-08-27 07:00:00     NaN     NaN
12    B 1992-08-27 07:30:00     NaN     NaN
13    B 1992-08-27 08:00:00    29.2    29.0
14    B 1992-08-27 08:30:00     NaN     NaN
15    B 1992-08-27 09:00:00     NaN     NaN
16    B 1992-08-27 09:30:00    30.0    37.0
17    B 1992-08-27 10:00:00     NaN     NaN
18    B 1992-08-27 10:30:00    24.6    37.0
19    B 1992-08-27 11:00:00    24.8    37.0
20    B 1992-08-27 11:30:00     NaN     NaN
21    B 1992-08-27 12:00:00     NaN     NaN

然后,您可以ffillfillna,取决于您的标准。


推荐阅读