首页 > 解决方案 > 将缺失的类别追加到行中

问题描述

我有一套idcategory但是,我希望每个id都具有相同的数量,category可以指定为df.id.category.unique().

例如: Input

df1 = {"id": [1,1,1,2,2,3,3,3,3],
      "category": ["a","b","e","a","d","a","b","c","d"]
      }

output1 = pd.DataFrame(df1)
output1
Out[57]: 
   id category
0   1        a
1   1        b
2   1        e
3   2        a
4   2        d
5   3        a
6   3        b
7   3        c
8   3        d

输出应该是: Output

df2 = {"id": [1,1,1,1,1,2,2,2,2,2,3,3,3,3,3],
      "category": sum([["a","b","c","d","e"] for _ in range(3)], [])}

output2 = pd.DataFrame(df2)
output2
Out[58]: 
    id category
0    1        a
1    1        b
2    1        c
3    1        d
4    1        e
5    2        a
6    2        b
7    2        c
8    2        d
9    2        e
10   3        a
11   3        b
12   3        c
13   3        d
14   3        e

如果可能的话,我希望有快速的优化。非常感谢!

标签: pythonpandasalgorithmnumpy

解决方案


使用itertools.product

from  itertools import product

df = pd.DataFrame(product(output1['id'].unique(), output1['category'].unique()),
                  columns=['id','category'])
    
print (df)
    id category
0    1        a
1    1        b
2    1        e
3    1        d
4    1        c
5    2        a
6    2        b
7    2        e
8    2        d
9    2        c
10   3        a
11   3        b
12   3        e
13   3        d
14   3        c

MultiIndex.from_productMultiIndex.to_frame

df = (pd.MultiIndex.from_product([output1['id'].unique(), output1['category'].unique()], 
                   names=['id','category'])
        .to_frame(index=False))
                     
print (df)
    id category
0    1        a
1    1        b
2    1        e
3    1        d
4    1        c
5    2        a
6    2        b
7    2        e
8    2        d
9    2        c
10   3        a
11   3        b
12   3        e
13   3        d
14   3        c

推荐阅读