python - pandas:添加具有先前时间点值的列
问题描述
我有df
非常规和参差不齐的(年度)数据的时间序列数据。它包含年份、国家和两个值的列,如下所示:
pd.DataFrame(columns = ["_year", "_country", "val1_t1", "val2_t1"],
data = [ [1996, "AUSTRALIA", 0.3870, 0.2570],
[1996, "NEW ZEALAND", 0.1120, 0.3390],
[1996, "UNITED STATES", 0.1264, 0.2515],
[1997, "UNITED KINGDOM", 0.4433, 0.3145],
[2002, "NEW ZEALAND", 0.4126, 0.2093],
[2004, "AUSTRALIA", 0.3764, 0.3610],
[2004, "UNITED STATES", 0.4152, 0.2610],
[2005, "UNITED KINGDOM", 0.3614, 0.3323],
[2007, "AUSTRALIA", 0.4338, 0.3628],
[2008, "NEW ZEALAND", 0.0091, 0.4493],
[2008, "UNITED STATES", 0.7980, 0.3584],
[2011, "NEW ZEALAND", 0.0108, 0.4731],
[2012, "UNITED STATES", 0.6874, 0.4710],
[2013, "AUSTRALIA", 0.3338, 0.4094],
[2014, "NEW ZEALAND", 0.0142, 0.4704],
[2015, "UNITED KINGDOM", 0.3118, 0.3778]])
我现在想再添加两列val1_t0
和val2_t0
,其中包含该国家/地区在上一个时间点的值。我有一本包含每个国家/地区各自上一年的字典:
t_1 = {"AUSTRALIA" : {2013: 2007, 2007: 2004, 2004: 1996, 1996: -99},
"NEW ZEALAND" : {2014: 2011, 2011: 2008, 2008: 2002, 2002: 1996, 1996: -99},
"UNITED STATES" : {2012: 2008, 2008: 2004, 2004: 1996, 1996: -99},
"UNITED KINGDOM" : {2015: 2005, 2005: 1997, 1997: -99}}
期望的结果是df
行,例如UNITED KINGDOM
in2015
有两个额外的列,其中来自val1_t0
上val1_t1
一个条目(这里:UNITED KINGDOM
in 2005
),如下所示:
pd.DataFrame(columns = ["_year", "_country", "val1_t1", "val2_t1", 'val1_t0', 'val2_t0'],
data = [ [1996, "AUSTRALIA", 0.3870, 0.2570, np.nan, np.nan], # AU < 1996 non-existent
[1996, "NEW ZEALAND", 0.1120, 0.3390, np.nan, np.nan], # NZ < 1996 non-existent
[1996, "UNITED STATES", 0.1264, 0.2515, np.nan, np.nan], # US < 1996 non-existent
[1997, "UNITED KINGDOM", 0.4433, 0.3145, np.nan, np.nan], # UK < 1997 non-existent
[2002, "NEW ZEALAND", 0.4126, 0.2093, 0.1120, 0.3390], # NZ 1996
[2004, "AUSTRALIA", 0.3764, 0.3610, 0.3870, 0.2570], # AU 1996
[2004, "UNITED STATES", 0.4152, 0.2610, 0.1264, 0.2515], # US 1996
[2005, "UNITED KINGDOM", 0.3614, 0.3323, 0.4126, 0.2093], # UK 1997
[2007, "AUSTRALIA", 0.4338, 0.3628, 0.3764, 0.3610], # AU 2004
[2008, "NEW ZEALAND", 0.0091, 0.4493, 0.4126, 0.2093], # NZ 2002
[2008, "UNITED STATES", 0.7980, 0.3584, 0.4152, 0.2610], # US 2004
[2011, "NEW ZEALAND", 0.0108, 0.4731, 0.0091, 0.4493], # NZ 2008
[2012, "UNITED STATES", 0.6874, 0.4710, 0.7980, 0.3584], # US 2008
[2013, "AUSTRALIA", 0.3338, 0.4094, 0.4338, 0.3628], # AU 2007
[2014, "NEW ZEALAND", 0.0142, 0.4704, 0.0108, 0.4731], # NZ 2011
[2015, "UNITED KINGDOM", 0.3118, 0.3778, 0.3614, 0.3323]]) # UK 2005
有人有优雅的实现吗?
解决方案
使用groupby
并且shift
应该做你正在寻找的东西。不确定是否使用您的字典,因为如果缺少年份,此方法不会受到影响。确保年份与sort_values
之前排序
df[["val1_t0", "val2_t0"]] = (df.sort_values('_year')
.groupby('_country')[["val1_t1", "val2_t1"]].shift())
print(df)
_year _country val1_t1 val2_t1 val1_t0 val2_t0
0 1996 AUSTRALIA 0.3870 0.2570 NaN NaN
1 1996 NEW ZEALAND 0.1120 0.3390 NaN NaN
2 1996 UNITED STATES 0.1264 0.2515 NaN NaN
3 1997 UNITED KINGDOM 0.4433 0.3145 NaN NaN
4 2002 NEW ZEALAND 0.4126 0.2093 0.1120 0.3390
5 2004 AUSTRALIA 0.3764 0.3610 0.3870 0.2570
6 2004 UNITED STATES 0.4152 0.2610 0.1264 0.2515
7 2005 UNITED KINGDOM 0.3614 0.3323 0.4433 0.3145
8 2007 AUSTRALIA 0.4338 0.3628 0.3764 0.3610
9 2008 NEW ZEALAND 0.0091 0.4493 0.4126 0.2093
10 2008 UNITED STATES 0.7980 0.3584 0.4152 0.2610
11 2011 NEW ZEALAND 0.0108 0.4731 0.0091 0.4493
12 2012 UNITED STATES 0.6874 0.4710 0.7980 0.3584
13 2013 AUSTRALIA 0.3338 0.4094 0.4338 0.3628
14 2014 NEW ZEALAND 0.0142 0.4704 0.0108 0.4731
15 2015 UNITED KINGDOM 0.3118 0.3778 0.3614 0.3323
推荐阅读
- database - 以编程方式创建租户数据库
- c# - 将事件序列化为 JSON
- netbeans - 如何让 NetBeans 12.0 在 Mac 上启动(MAC OS 11.0 beta / Big Sur)
- reactjs - 如何在图表中的 x 轴上显示自定义值?
- python - Pycharm,安装下一个解释器后创建新项目时 SRE 模块不匹配
- docker - 无法在 Docker 中实现 WatchTower
- javascript - 使用 DOM 事件调用 Javascript 函数
- makefile - 如何编写多个文件夹makefile
- json - 按包名动态加载 json 文件(Typescript)
- javascript - QML 用 JavaScript 定义焦点链序列