python - 如何计算熊猫数据框中选定列中值的唯一组合，包括值为0的频率？

问题描述

在我的数据框中（假设它称为 df），我有两列：一列标记为颜色，一列标记为 TOY_ID。使用df.groupby(['Colour', 'TOY_ID']).size()我能够生成第三列，它表示其他两列的值出现在我的 df 中的次数的频率。输出示例如下所示：

Colour            TOY_ID
Blue              31490.0       50
                  31569.0       50
                  50360636.0    20

                                ..
Yellow            50360636.0    25
                  50366678.0     9

                                ..
Green             31490.0       17
                  50366678.0    10

尽管此方法有效，但它没有显示前两列值为 0 的组合。我知道这可以在 R 中完成，但我不确定如何在 Python 中做到这一点。我想要的输出示例如下。有什么建议么？

Colour            TOY_ID
Blue                 31490.0    50
                     31569.0    50
                  50360636.0    20
                  50366678.0     0
                                ..
Yellow               31490.0     0
                     31569.0     0
                  50360636.0    25
                  50366678.0     9
                                ..
Green                31490.0    17
                     31569.0     0
                  50360636.0     0
                  50366678.0    10

标签： pythonpandas

Series.reindex与一起使用MultiIndex.from_product：

s = df.groupby(['Colour', 'TOY_ID']).size()


s = s.reindex(pd.MultiIndex.from_product(s.index.levels), fill_value=0)
print (s)
Colour  TOY_ID    
Blue    31490.0       50
        31569.0       50
        50360636.0    20
        50366678.0     0
Green   31490.0       17
        31569.0        0
        50360636.0     0
        50366678.0    10
Yellow  31490.0        0
        31569.0        0
        50360636.0    25
        50366678.0     9
Name: a, dtype: int64

python - 如何计算熊猫数据框中选定列中值的唯一组合，包括值为0的频率？

问题描述

解决方案

推荐阅读