首页 > 解决方案 > 分组列和计算

问题描述

我的代码如下:

df.loc[df['Shape'].isin(Shapes), 'Shape'].value_counts().div(len(df)).to_frame().reset_index()

这给了我出现的次数,然后是该值的百分比,可以说是整个数据帧的三角形。但是,如果我想添加另一列以将其作为一个组分层,我将如何调整呢?

当前代码给了我整个df中每个形状的百分比

Triangle .20
Square   .40
Circle   .40

我也想要它的颜色,所以输出如下:

Triangle  Blue  .20
Triangle  Red   .40
Triangle  Black .40
Square    Blue  .40
Square    Red   .30
Square    Purple.30
...

谢谢

标签: pythonpython-3.xpandas

解决方案


我认为您可以使用GroupBy.size多个列:

np.random.seed(2020)
s = ['Triangle','Square','Circle', 'Rectangle']
c = ['Blue','Red','Black', 'Purple']    

df = pd.DataFrame({'Shape':np.random.choice(s, size=20),
                   'Colors':np.random.choice(c, size=20)})
#print (df)

Shapes = ['Triangle','Square','Circle'] 

df1 = (df.loc[df['Shape'].isin(Shapes)]
           .groupby(['Shape', 'Colors'])
           .size()
           .div(len(df))
           .reset_index(name='per'))
print (df1)
      Shape  Colors   per
0    Circle   Black  0.10
1    Circle     Red  0.05
2    Square    Blue  0.05
3    Square     Red  0.10
4  Triangle   Black  0.05
5  Triangle    Blue  0.05
6  Triangle  Purple  0.10
7  Triangle     Red  0.10

与 替代SeriesGroupBy.value_counts,不同之处是值按组排序:

df1 = (df.loc[df['Shape'].isin(Shapes)]
           .groupby(['Shape'])['Colors']
           .value_counts()
           .div(len(df))
           .reset_index(name='per'))
print (df1)
      Shape  Colors   per
0    Circle   Black  0.10
1    Circle     Red  0.05
2    Square     Red  0.10
3    Square    Blue  0.05
4  Triangle  Purple  0.10
5  Triangle     Red  0.10
6  Triangle   Black  0.05
7  Triangle    Blue  0.05

如果想要每组的百分比(每组的总百分比为1100%),则使用:

Shapes = ['Triangle','Square','Circle'] 

df2 = (df.loc[df['Shape'].isin(Shapes)]
           .groupby(['Shape'])['Colors']
           .value_counts(normalize=True)
           .reset_index(name='per'))
print (df2)
      Shape  Colors       per
0    Circle   Black  0.666667
1    Circle     Red  0.333333
2    Square     Red  0.666667
3    Square    Blue  0.333333
4  Triangle  Purple  0.333333
5  Triangle     Red  0.333333
6  Triangle   Black  0.166667
7  Triangle    Blue  0.166667

推荐阅读