首页 > 解决方案 > 需要帮助对一系列 ID 中至少出现一个女性的位置进行分类

问题描述

我有一个带有性别信息的 ID 列表。我需要对至少出现一位女性的 ID 进行分类。以下是供参考的数据。

ID  Gender
1   Female
1   Female
2   Male
2   Male
3   Female
3   Male
4   Male
4   Male
4   Male
4   Male
4   Female
5   Female
5   Male
5   Female
6   Male
6   Male
6   Male
6   Male
7   Female
8   Male
8   Male
9   Male
10  Male
10  Male
11  Male
11  Female
13  Male
14  Male

如果 ID 相同,我试图创建两列,另一列检查它是否有女性。将基于两列结果创建输出。但不知何故,我认为他们将是一个更好的方式。

 import re,os, subprocess,  pandas as pd, numpy as np
    data = pd.read_excel(r"C:\Analytics\TA Dashboard\test\test.xlsx")
    data['match1'] =data['Reference ID'].eq(data['Reference ID'].shift())
    data['match2'] =data.eq('Female').any(axis=1)

根据 ID 和 Gender 的组合,输出需要为“是”或“否”,对于相同的 ID,如果任何 ID 上有女性,则所有 ID 都应为“是”,否则为“否”。

ID  Gender  OUTPUT
1   Female  Yes
1   Female  Yes
2   Male    NO
2   Male    NO
3   Female  Yes
3   Male    Yes
4   Male    Yes
4   Male    Yes
4   Male    Yes
4   Male    Yes
4   Female  Yes
5   Female  Yes
5   Male    Yes
5   Female  Yes
6   Male    NO
6   Male    NO
6   Male    NO
6   Male    NO
7   Female  YES
8   Male    NO
8   Male    NO
9   Male    NO
10  Male    NO
10  Male    NO
11  Male    Yes
11  Female  Yes
13  Male    NO
14  Male    NO

标签: pythonpandas

解决方案


检查在Gender哪里Femalegroupbytransform使用any

df['OUTPUT'] = df.Gender.eq('Female').groupby(df.ID).transform('any')
# If you want Yes/No strings
# df['OUTPU'] = df.OUTPUT.map({True:'Yes', False:'NO'})

    ID  Gender  OUTPUT
0    1  Female    True
1    1  Female    True
2    2    Male   False
3    2    Male   False
4    3  Female    True
5    3    Male    True
6    4    Male    True
7    4    Male    True
8    4    Male    True
9    4    Male    True
...

推荐阅读