首页 > 解决方案 > 延长数据框 Python 中值的日期

问题描述

我的数据看起来像:

Year      Month       Region       Value1       Value2
2016        1         west         2            3
2016        1         east         4            5
2016        1         north        5            3
2016        2         west         6            4
2016        2         east         7            3
.
.
2016        12        west         2            3
2016        12        east         3            7
2016        12        north        6            8
2017        1         west         2            3
.
.
2018        7         west         1            1
2018        7         east         9            9
2018        7         north        5            1

我想将每个月的值扩展到 2021 年,但保留集合中最后一个月的先前值(2018 年的第 7 个月)。

所需的输出将按地区、月份和年份附加到每个集合的末尾,例如:

2018        7         west         1            1
2018        7         east         9            9
2018        7         north        5            1
2018        8         west         1            1
2018        8         east         9            9
2018        8         north        5            1
2018        9         west         1            1
2018        9         east         9            9
2018        9         north        5            1
.
.
2019        7         west         1            1
2019        7         east         9            9
2019        7         north        5            1
.
.
2021        7         west         1            1
2021        7         east         9            9
2021        7         north        5            1

解决这个问题的最佳方法是什么?

标签: pythonpython-3.xpandaspython-2.7dataframe

解决方案


我将创建一个使用pd.date_range频率为几个月的函数:

此函数假定您有三个区域,但可以修改更多。

def myFunction(df, periods, freq='M'):
    # find the last date in the df
    last = pd.to_datetime(df.Year*10000+df.Month*100+1,format='%Y%m%d').max()

    # create new date range based on n periods with a freq of months
    newDates = pd.date_range(start=last, periods=periods+1, freq=freq)
    newDates = newDates[newDates>last]
    newDates = newDates[:periods+1]
    new_df = pd.DataFrame({'Date':newDates})[1:]

    # convert Date to year and month columns
    new_df['Year'] = new_df['Date'].dt.year
    new_df['Month'] = new_df['Date'].dt.month
    new_df.drop(columns='Date', inplace=True)

    # add your three regions and ffill values
    west = df[:-2].append([new_df], sort=False, ignore_index=True).ffill()
    east = df[:-1].append([new_df], sort=False, ignore_index=True).ffill()
    north = df.append([new_df], sort=False, ignore_index=True).ffill()

    # append you three region dfs and drop duplicates
    new = west.append([east,north], sort=False, ignore_index=True).drop_duplicates()
    return new.sort_values(['Year', 'Month']).reset_index().drop(columns='index')

myFunction(df,3)

将周期设置为三个,这将返回接下来的三个月...

    Year    Month   Region  Value1  Value2
0   2016    1        west   2.0      3.0
1   2016    1        east   4.0      5.0
2   2016    1        north  5.0      3.0
3   2016    2        west   6.0      4.0
4   2016    2        east   7.0      3.0
5   2016    12       west   2.0      3.0
6   2016    12       east   3.0      7.0
7   2016    12       north  6.0      8.0
8   2017    1        west   2.0      3.0
9   2018    7        west   1.0      1.0
10  2018    7        east   9.0      9.0
11  2018    7        north  5.0      1.0
12  2018    8        west   1.0      1.0
13  2018    8        east   9.0      9.0
14  2018    8        north  5.0      1.0
15  2018    9        west   1.0      1.0
16  2018    9        east   9.0      9.0
17  2018    9        north  5.0      1.0
18  2018    10       west   1.0      1.0
19  2018    10       east   9.0      9.0
20  2018    10       north  5.0      1.0

推荐阅读