首页 > 解决方案 > 如何根据 Pandas 中的其他 df 创建特定的 DataFrame?

问题描述

我有如下数据框:

data = pd.DataFrame({"Country" : ["Brazil", "Brazil", "Germany", "Germany", "UK"],
                     "Order method" : ["Phone", "Retail", "Web", "Web", "Retail"]})

我想根据上面的数据框创建新的DataFrame,我希望看到如下结果:

在此处输入图像描述

标签: pythonpandasfilter

解决方案


使用GroupBy.sizeSeries.unstackDataFrame.stack添加缺失的类别:

s = data.groupby(['Country','Order method']).size().unstack(fill_value=0).stack()
print (s)
Country  Order method
Brazil   Phone           1
         Retail          1
         Web             0
Germany  Phone           0
         Retail          0
         Web             2
UK       Phone           0
         Retail          1
         Web             0
dtype: int64

对于DataFrame添加DataFrame.reset_index

df = (data.groupby(['Country','Order method'])
          .size()
          .unstack(fill_value=0)
          .stack()
          .reset_index(name='Count'))
print (df)

   Country Order method  Count
0   Brazil        Phone      1
1   Brazil       Retail      1
2   Brazil          Web      0
3  Germany        Phone      0
4  Germany       Retail      0
5  Germany          Web      2
6       UK        Phone      0
7       UK       Retail      1
8       UK          Web      0

最后,如有必要,将重复值替换为空字符串,使用Series.maskwith Series.duplicated

df['Country'] = df['Country'].mask(df['Country'].duplicated(), '')
print (df)

   Country Order method  Count
0   Brazil        Phone      1
1                Retail      1
2                   Web      0
3  Germany        Phone      0
4                Retail      0
5                   Web      2
6       UK        Phone      0
7                Retail      1
8                   Web      0

推荐阅读