首页 > 解决方案 > pandas:添加具有先前时间点值的列

问题描述

我有df非常规和参差不齐的(年度)数据的时间序列数据。它包含年份、国家和两个值的列,如下所示:

pd.DataFrame(columns = ["_year", "_country", "val1_t1", "val2_t1"],
         data = [   [1996, "AUSTRALIA",         0.3870, 0.2570],
                    [1996, "NEW ZEALAND",       0.1120, 0.3390],
                    [1996, "UNITED STATES",     0.1264, 0.2515],
                    [1997, "UNITED KINGDOM",    0.4433, 0.3145],
                    [2002, "NEW ZEALAND",       0.4126, 0.2093],
                    [2004, "AUSTRALIA",         0.3764, 0.3610],
                    [2004, "UNITED STATES",     0.4152, 0.2610],
                    [2005, "UNITED KINGDOM",    0.3614, 0.3323],
                    [2007, "AUSTRALIA",         0.4338, 0.3628],
                    [2008, "NEW ZEALAND",       0.0091, 0.4493],
                    [2008, "UNITED STATES",     0.7980, 0.3584],
                    [2011, "NEW ZEALAND",       0.0108, 0.4731],
                    [2012, "UNITED STATES",     0.6874, 0.4710],
                    [2013, "AUSTRALIA",         0.3338, 0.4094],
                    [2014, "NEW ZEALAND",       0.0142, 0.4704],
                    [2015, "UNITED KINGDOM",    0.3118, 0.3778]])

我现在想再添加两列val1_t0val2_t0,其中包含该国家/地区在上一个时间点的值。我有一本包含每个国家/地区各自上一年的字典:

t_1 = {"AUSTRALIA" : {2013: 2007, 2007: 2004, 2004: 1996, 1996: -99},
       "NEW ZEALAND" : {2014: 2011, 2011: 2008, 2008: 2002, 2002: 1996, 1996: -99},
       "UNITED STATES" : {2012: 2008, 2008: 2004, 2004: 1996, 1996: -99},
       "UNITED KINGDOM" : {2015: 2005, 2005: 1997, 1997: -99}}

期望的结果是df行,例如UNITED KINGDOMin2015有两个额外的列,其中来自val1_t0val1_t1一个条目(这里:UNITED KINGDOMin 2005),如下所示:

pd.DataFrame(columns = ["_year", "_country", "val1_t1", "val2_t1", 'val1_t0', 'val2_t0'],
         data = [   [1996, "AUSTRALIA",         0.3870, 0.2570, np.nan, np.nan],  # AU < 1996 non-existent
                    [1996, "NEW ZEALAND",       0.1120, 0.3390, np.nan, np.nan],  # NZ < 1996 non-existent
                    [1996, "UNITED STATES",     0.1264, 0.2515, np.nan, np.nan],  # US < 1996 non-existent
                    [1997, "UNITED KINGDOM",    0.4433, 0.3145, np.nan, np.nan],  # UK < 1997 non-existent
                    [2002, "NEW ZEALAND",       0.4126, 0.2093, 0.1120, 0.3390],  # NZ 1996
                    [2004, "AUSTRALIA",         0.3764, 0.3610, 0.3870, 0.2570],  # AU 1996
                    [2004, "UNITED STATES",     0.4152, 0.2610, 0.1264, 0.2515],  # US 1996
                    [2005, "UNITED KINGDOM",    0.3614, 0.3323, 0.4126, 0.2093],  # UK 1997
                    [2007, "AUSTRALIA",         0.4338, 0.3628, 0.3764, 0.3610],  # AU 2004
                    [2008, "NEW ZEALAND",       0.0091, 0.4493, 0.4126, 0.2093],  # NZ 2002
                    [2008, "UNITED STATES",     0.7980, 0.3584, 0.4152, 0.2610],  # US 2004
                    [2011, "NEW ZEALAND",       0.0108, 0.4731, 0.0091, 0.4493],  # NZ 2008
                    [2012, "UNITED STATES",     0.6874, 0.4710, 0.7980, 0.3584],  # US 2008
                    [2013, "AUSTRALIA",         0.3338, 0.4094, 0.4338, 0.3628],  # AU 2007
                    [2014, "NEW ZEALAND",       0.0142, 0.4704, 0.0108, 0.4731],  # NZ 2011
                    [2015, "UNITED KINGDOM",    0.3118, 0.3778, 0.3614, 0.3323]]) # UK 2005

有人有优雅的实现吗?

标签: pythonpandas

解决方案


使用groupby并且shift应该做你正在寻找的东西。不确定是否使用您的字典,因为如果缺少年份,此方法不会受到影响。确保年份与sort_values之前排序

df[["val1_t0", "val2_t0"]] = (df.sort_values('_year')
                                .groupby('_country')[["val1_t1", "val2_t1"]].shift())

print(df)
    _year        _country  val1_t1  val2_t1  val1_t0  val2_t0
0    1996       AUSTRALIA   0.3870   0.2570      NaN      NaN
1    1996     NEW ZEALAND   0.1120   0.3390      NaN      NaN
2    1996   UNITED STATES   0.1264   0.2515      NaN      NaN
3    1997  UNITED KINGDOM   0.4433   0.3145      NaN      NaN
4    2002     NEW ZEALAND   0.4126   0.2093   0.1120   0.3390
5    2004       AUSTRALIA   0.3764   0.3610   0.3870   0.2570
6    2004   UNITED STATES   0.4152   0.2610   0.1264   0.2515
7    2005  UNITED KINGDOM   0.3614   0.3323   0.4433   0.3145
8    2007       AUSTRALIA   0.4338   0.3628   0.3764   0.3610
9    2008     NEW ZEALAND   0.0091   0.4493   0.4126   0.2093
10   2008   UNITED STATES   0.7980   0.3584   0.4152   0.2610
11   2011     NEW ZEALAND   0.0108   0.4731   0.0091   0.4493
12   2012   UNITED STATES   0.6874   0.4710   0.7980   0.3584
13   2013       AUSTRALIA   0.3338   0.4094   0.4338   0.3628
14   2014     NEW ZEALAND   0.0142   0.4704   0.0108   0.4731
15   2015  UNITED KINGDOM   0.3118   0.3778   0.3614   0.3323

推荐阅读