首页 > 解决方案 > Create new columns from aggregated categories 2

问题描述

+-------+------------+---------------+-----------------+
| INDEX | SK_ID_CURR | CREDIT_ACTIVE | CREDIT_TYPE     |
+-------+------------+---------------+-----------------+
|     0 |     215354 | Closed        | Consumer credit |
+-------+------------+---------------+-----------------+
|     1 |     215354 | Active        | Credit card     |
+-------+------------+---------------+-----------------
|     2 |     215354 | Active        | Consumer credit |
+-------+------------+---------------+-----------------+
|     3 |     215354 | Active        | Credit card     |
+-------+------------+---------------+-----------------+
|     4 |     215354 | Active        | Consumer credit |
+-------+------------+---------------+-----------------+
|     5 |     215354 | Active        | Credit card     |
+-------+------------+---------------+-----------------+
|     6 |     215354 | Active        | Consumer credit |
+-------+------------+---------------+-----------------+
|     7 |     162297 | Closed        | Consumer credit |
+-------+------------+---------------+-----------------+
|     8 |     162297 | Closed        | Consumer credit |
+-------+------------+---------------+-----------------+
|     9 |     162297 | Active        | Credit card     |
+-------+------------+---------------+-----------------+
|    10 |     162297 | Active        | Credit card     |
+-------+------------+---------------+-----------------+
|    11 |     162297 | Closed        | Consumer credit |
+-------+------------+---------------+-----------------+
|    12 |     162297 | Active        | Mortgage        |
+-------+------------+---------------+-----------------+
|    13 |     402440 | Active        | Consumer credit |
+-------+------------+---------------+-----------------+
|    14 |     238881 | Closed        | Credit card     |
+-------+------------+---------------+-----------------+

I have the table above. I'd like to aggregate each column per id. For example I need to count the number of active and closed credits per SK_ID_CURR, and then make a column for active_credits and closed_credits, with the counted values. And the same for CREDIT_TYPE.

like:

SK_ID_CURR CREDIT_ACTIVE CREDIT_CLOSED CONSUMER_CREDIT CREDIT_CARD
215354       6                  1           4             3

标签: pythonpandas

解决方案


对于这个数据框:

d={'SK_ID_CURR':[215354, 215354, 215354, 215354, 215354, 215354, 215354, 162297, 162297, 162297, 162297, 162297, 162297,402440 ,238881],
   'CREDIT_ACTIVE':['Closed', 'Active', 'Active', 'Active', 'Active', 'Active', 'Active', 'Closed', 'Closed', 'Active', 'Active', 'Closed', 'Active', 'Active', 'Closed' ],
   'CREDIT_TYPE':['Consumer credit', 'Credit card', 'Consumer credit', 'Credit card', 'Consumer credit', 'Credit card', 'Consumer credit', 'Consumer credit', 'Consumer credit', 'Credit card', 'Credit card', 'Consumer credit',                      'Mortgage', 'Consumer credit', 'Credit card', ]}
df=pd.DataFrame(d)

print(df)

输出:

    SK_ID_CURR CREDIT_ACTIVE      CREDIT_TYPE
0       215354        Closed  Consumer credit
1       215354        Active      Credit card
2       215354        Active  Consumer credit
3       215354        Active      Credit card
4       215354        Active  Consumer credit
5       215354        Active      Credit card
6       215354        Active  Consumer credit
7       162297        Closed  Consumer credit
8       162297        Closed  Consumer credit
9       162297        Active      Credit card
10      162297        Active      Credit card
11      162297        Closed  Consumer credit
12      162297        Active         Mortgage
13      402440        Active  Consumer credit
14      238881        Closed      Credit card

你可以尝试这样的事情:

aggregations = {
        'CREDIT_ACTIVE': { # work on this column, 
            'CREDIT_ACTIVE': lambda x: list(x).count('Active'),
            'CREDIT_CLOSED': lambda x: list(x).count('Closed') 
        },
        'CREDIT_TYPE': { # work on this column, 
            'CONSUMER_CREDIT ': lambda x: list(x).count('Consumer credit'),
            'CREDIT_CARD': lambda x: list(x).count('Credit card') 
        }}
temp=df.groupby('SK_ID_CURR').agg(aggregations).reset_index()
temp.columns = [e[1] for e in temp.columns.tolist()] 

print(temp)

输出:

           CREDIT_ACTIVE  CREDIT_CLOSED  CONSUMER_CREDIT   CREDIT_CARD
0  162297              3              3                 3            2
1  215354              6              1                 4            3
2  238881              0              1                 0            1
3  402440              1              0                 1            0

推荐阅读