首页 > 解决方案 > 在 Pandas DataFrame 中为按另一列分组的缺失数据添加行

问题描述

我有一个 Pandas 数据框,其中dates肯定products丢失了。我想将这些行添加到数据框中并为其分配sales值 0。我该怎么做?

# Sample dataframe
import pandas as pd
df = pd.DataFrame({
    'date': ['2020-01-01', '2020-01-01', '2020-01-01', '2020-01-02', '2020-01-02', '2020-01-03', '2020-01-03'],
    'product': ['glass', 'clothes', 'food', 'glass', 'food', 'glass', 'clothes'],
    'sales': [100, 120, 50, 90, 60, 110, 130]
})

        date    product sales
0   2020-01-01  glass   100
1   2020-01-01  clothes 120
2   2020-01-01  food    50
3   2020-01-02  glass   90
4   2020-01-02  food    60
5   2020-01-03  glass   110
6   2020-01-03  clothes 130

## 'clothes' is missing for 2020-01-02 and 'food' is missing for 2020-01-03
## What I want to get: 
        date    product sales
0   2020-01-01  glass   100
1   2020-01-01  clothes 120
2   2020-01-01  food    50
3   2020-01-02  glass   90
4   2020-01-02  clothes 0
5   2020-01-02  food    60
6   2020-01-03  glass   110
7   2020-01-03  clothes 130
8   2020-01-03  food    0

标签: pythonpandas

解决方案


你可以这样做unstack()/stack()

(df.set_index(['date','product'])
   .unstack(fill_value=0)
   .stack()
   .reset_index()
)

输出:

         date  product  sales
0  2020-01-01  clothes    120
1  2020-01-01     food     50
2  2020-01-01    glass    100
3  2020-01-02  clothes      0
4  2020-01-02     food     60
5  2020-01-02    glass     90
6  2020-01-03  clothes    130
7  2020-01-03     food      0
8  2020-01-03    glass    110

推荐阅读