首页 > 解决方案 > How to group by columns?

问题描述

I am having trouble figuring out how to group rows by columns. My goal is to count the number of 'Package Codes' where column values are orange and blue.

I am working with thousands of rows of data. This is a subset of the data:

Country   Package Code   Color    Type
US        100            Orange    a
US        100            Orange    b
US        100            Orange    c
Mexico    200            Green     d
US        300            Blue      e
Canada    400            Red       f
Germany   500            Red       g
Germany   600            Blue      h

Desired Output:

Country   Packages
US         2
Mexico     0
Canada     0
Germany    1

标签: group-by

解决方案


Using isin + nunique + reindex

(df.loc[df.Color.isin(['Orange', 'Blue'])].groupby('Country')['Package Code']
    .nunique().reindex(df.Country.unique(), fill_value=0)).to_frame('Total').reset_index()

   Country  Total
0       US      2
1   Mexico      0
2   Canada      0
3  Germany      1

Here is the above command broken down a bit for better readability:

# Select rows where the color is Orange or Blue
u = df.loc[df.Color.isin(['Orange', 'Blue'])]

# Find the unique values for Package Code, grouped by Country
w = u.groupby('Country')['Package Code'].nunique()

# Add in missing countries with a value of 0
w.reindex(df.Country.unique(), fill_value=0).to_frame('Total').reset_index()

推荐阅读