python - Create new columns from aggregated categories 2
问题描述
+-------+------------+---------------+-----------------+
| INDEX | SK_ID_CURR | CREDIT_ACTIVE | CREDIT_TYPE |
+-------+------------+---------------+-----------------+
| 0 | 215354 | Closed | Consumer credit |
+-------+------------+---------------+-----------------+
| 1 | 215354 | Active | Credit card |
+-------+------------+---------------+-----------------
| 2 | 215354 | Active | Consumer credit |
+-------+------------+---------------+-----------------+
| 3 | 215354 | Active | Credit card |
+-------+------------+---------------+-----------------+
| 4 | 215354 | Active | Consumer credit |
+-------+------------+---------------+-----------------+
| 5 | 215354 | Active | Credit card |
+-------+------------+---------------+-----------------+
| 6 | 215354 | Active | Consumer credit |
+-------+------------+---------------+-----------------+
| 7 | 162297 | Closed | Consumer credit |
+-------+------------+---------------+-----------------+
| 8 | 162297 | Closed | Consumer credit |
+-------+------------+---------------+-----------------+
| 9 | 162297 | Active | Credit card |
+-------+------------+---------------+-----------------+
| 10 | 162297 | Active | Credit card |
+-------+------------+---------------+-----------------+
| 11 | 162297 | Closed | Consumer credit |
+-------+------------+---------------+-----------------+
| 12 | 162297 | Active | Mortgage |
+-------+------------+---------------+-----------------+
| 13 | 402440 | Active | Consumer credit |
+-------+------------+---------------+-----------------+
| 14 | 238881 | Closed | Credit card |
+-------+------------+---------------+-----------------+
I have the table above. I'd like to aggregate each column per id. For example I need to count the number of active and closed credits per SK_ID_CURR
, and then make a column for active_credits and closed_credits, with the counted values. And the same for CREDIT_TYPE
.
like:
SK_ID_CURR CREDIT_ACTIVE CREDIT_CLOSED CONSUMER_CREDIT CREDIT_CARD
215354 6 1 4 3
解决方案
对于这个数据框:
d={'SK_ID_CURR':[215354, 215354, 215354, 215354, 215354, 215354, 215354, 162297, 162297, 162297, 162297, 162297, 162297,402440 ,238881],
'CREDIT_ACTIVE':['Closed', 'Active', 'Active', 'Active', 'Active', 'Active', 'Active', 'Closed', 'Closed', 'Active', 'Active', 'Closed', 'Active', 'Active', 'Closed' ],
'CREDIT_TYPE':['Consumer credit', 'Credit card', 'Consumer credit', 'Credit card', 'Consumer credit', 'Credit card', 'Consumer credit', 'Consumer credit', 'Consumer credit', 'Credit card', 'Credit card', 'Consumer credit', 'Mortgage', 'Consumer credit', 'Credit card', ]}
df=pd.DataFrame(d)
print(df)
输出:
SK_ID_CURR CREDIT_ACTIVE CREDIT_TYPE
0 215354 Closed Consumer credit
1 215354 Active Credit card
2 215354 Active Consumer credit
3 215354 Active Credit card
4 215354 Active Consumer credit
5 215354 Active Credit card
6 215354 Active Consumer credit
7 162297 Closed Consumer credit
8 162297 Closed Consumer credit
9 162297 Active Credit card
10 162297 Active Credit card
11 162297 Closed Consumer credit
12 162297 Active Mortgage
13 402440 Active Consumer credit
14 238881 Closed Credit card
你可以尝试这样的事情:
aggregations = {
'CREDIT_ACTIVE': { # work on this column,
'CREDIT_ACTIVE': lambda x: list(x).count('Active'),
'CREDIT_CLOSED': lambda x: list(x).count('Closed')
},
'CREDIT_TYPE': { # work on this column,
'CONSUMER_CREDIT ': lambda x: list(x).count('Consumer credit'),
'CREDIT_CARD': lambda x: list(x).count('Credit card')
}}
temp=df.groupby('SK_ID_CURR').agg(aggregations).reset_index()
temp.columns = [e[1] for e in temp.columns.tolist()]
print(temp)
输出:
CREDIT_ACTIVE CREDIT_CLOSED CONSUMER_CREDIT CREDIT_CARD
0 162297 3 3 3 2
1 215354 6 1 4 3
2 238881 0 1 0 1
3 402440 1 0 1 0
推荐阅读
- sql - 包含包含的 Excel VBA SQL 语句
- flutter - 插件 `flutter_plugin_android_lifecycle` 使用了已弃用的 Android 嵌入版本
- c# - 在 while 循环中调用 StringBuilder.ToString() 调试 .NET 内存泄漏
- git - Git添加不起作用,推送已删除的文件
- google-maps - WebGL Overlay 谷歌地图:如何投射阴影?
- reactjs - Reactjs 单元测试:如何覆盖 useEffect 钩子中的代码?使用 Jest 和 Enzyme
- recaptcha - Recptcha V3 与原型版本冲突:'1.7' 问题
- javascript - Javascript - 文档查询选择器
- python - 如何在流式传输期间每 n 秒获取特定帧?
- php - 作曲家需要没有命令