首页 > 解决方案 > 填写每组行的数据框值

问题描述

假设我有以下数据集:

Time    Geography           Sex     Population
1990    Northern Ireland    Male    NA
1990    Northern Ireland    Female  NA
1990    Northern Ireland    Total   NA
1991    Northern Ireland    Male    NA
1991    Northern Ireland    Female  NA
1991    Northern Ireland    Total   NA
1992    Northern Ireland    Male    792100
1992    Northern Ireland    Female  831100
1992    Northern Ireland    Total   1623300
1993    Northern Ireland    Male    812100
1993    Northern Ireland    Female  851100
1993    Northern Ireland    Total   1663200

最后我想要以下内容:

Time    Geography           Sex     Population
1990    Northern Ireland    Male    792100
1990    Northern Ireland    Female  831100
1990    Northern Ireland    Total   1623300
1991    Northern Ireland    Male    792100
1991    Northern Ireland    Female  831100
1991    Northern Ireland    Total   1623300
1992    Northern Ireland    Male    792100
1992    Northern Ireland    Female  831100
1992    Northern Ireland    Total   1623300
1993    Northern Ireland    Male    812100
1993    Northern Ireland    Female  851100
1993    Northern Ireland    Total   1663200

这意味着基本上我想用没有 NA 的第一年的值填写前几年的值。

我该怎么做呢?

标签: pythonpandas

解决方案


你可以试试这个:

df.set_index(['Time','Geography','Sex']).unstack().bfill().stack().reset_index()

输出:

   Time         Geography     Sex  Population
0  1990  Northern Ireland  Female    831100.0
1  1990  Northern Ireland    Male    792100.0
2  1990  Northern Ireland   Total   1623300.0
3  1991  Northern Ireland  Female    831100.0
4  1991  Northern Ireland    Male    792100.0
5  1991  Northern Ireland   Total   1623300.0
6  1992  Northern Ireland  Female    831100.0
7  1992  Northern Ireland    Male    792100.0
8  1992  Northern Ireland   Total   1623300.0

推荐阅读