首页 > 解决方案 > 与日期时间索引条件合并

问题描述

df1 看起来像这样-

week_date                   Values
21-04-2019 00:00:00          10
28-04-2019 00:00:00          20

df2 看起来像这样-

hourly_date                 hour_val
21-04-2019 00:00:00            a
21-04-2019 01:00:00            b
21-04-2019 02:00:00            c
21-04-2019 03:00:00            d
28-04-2019 00:00:00            e

结果数据集应如下所示

week_date                 Values      hourly_date                 hour_val
21-04-2019 00:00:00          10        21-04-2019 00:00:00            a
21-04-2019 00:00:00          10        21-04-2019 01:00:00            b
21-04-2019 00:00:00          10        21-04-2019 02:00:00            c
21-04-2019 00:00:00          10        21-04-2019 03:00:00            d
28-04-2019 00:00:00          20        28-04-2019 00:00:00            e

我有数百个每周行数据和数千个每小时行数据。尝试合并但没有获得所需的输出。

merge=pd.merge(df1,df2, how='outer', left_index=True, right_index=True)

结果数据集应如下所示

week_date                 Values      hourly_date                 hour_val
21-04-2019 00:00:00          10        21-04-2019 00:00:00            a
21-04-2019 00:00:00          10        21-04-2019 01:00:00            b
21-04-2019 00:00:00          10        21-04-2019 02:00:00            c
21-04-2019 00:00:00          10        21-04-2019 03:00:00            d
28-04-2019 00:00:00          20        28-04-2019 00:00:00            e

标签: python

解决方案


您可以合并yearweek在这种情况下,请尝试以下操作:

import pandas as pd

df1 = pd.DataFrame(
{
    "week_date": ["21-04-2019 00:00:00", "28-04-2019 00:00:00"],
    "Values": [10,20]
}
)
df2 = pd.DataFrame(
    {
    "hourly_date": [
        "21-04-2019 00:00:00",
        "21-04-2019 01:00:00",
        "21-04-2019 02:00:00",
        "21-04-2019 03:00:00",
        "28-04-2019 00:00:00"
    ],
    "hour_val": ["a","b","c","d","e"]
}
)

df1.week_date = pd.to_datetime(df1.week_date)
df1 = df1.set_index("week_date", drop=False)

df2.hourly_date = pd.to_datetime(df2.hourly_date)
df2 = df2.set_index("hourly_date", drop=False)

pd.merge(df1, df2, 
         left_on=[df1.week_date.dt.week, df1.week_date.dt.year],
         right_on=[df2.hourly_date.dt.week, df2.hourly_date.dt.year]
        )[["week_date", "Values","hourly_date","hour_val"]].set_index("week_date")

这输出

         Values hourly_date hour_val
week_date           
2019-04-21  10  2019-04-21 00:00:00 a
2019-04-21  10  2019-04-21 01:00:00 b
2019-04-21  10  2019-04-21 02:00:00 c
2019-04-21  10  2019-04-21 03:00:00 d
2019-04-28  20  2019-04-28 00:00:00 e

推荐阅读