首页 > 解决方案 > Pandas inter-column referencing

问题描述

I have some data as follows:

+--------+------+
| Reason | Keys |
+--------+------+
| x      | a    |
| y      | a    |
| z      | a    |
| y      | b    |
| z      | b    |
| x      | c    |
| w      | d    |
| x      | d    |
| w      | d    |
+--------+------+

I want to get the Reason corresponding to the first occurrence of each Key. Like here, I should get Reasons x,y,x,w for Keys a,b,c,d respectively. After that, I want to compute the percentage of each Reason, as in a metric for how many times each Reason occurs. Thus x = 2/4 = 50%. And w,y = 25% each.

For the percentage, I think I can use something like value_counts(normalize=True) * 100, based on the previous step. What is a good way to proceed?

标签: pythonpandas

解决方案


您对第二步是正确的,第一步可以通过

summary = df.groupby("Keys").first()

推荐阅读