首页 > 解决方案 > 删除列列表的重复元素并进行比较

问题描述

我有一个数据框,其中所有列都是值列表(如果它有助于操作,可以是元组)。索引“名称”不是一个列表,它们是该数据中的唯一值。大多数列表都有重复的值: 在此处输入图像描述

Name	Department	Business Unit	Person Number	Line Manager Name	Line Manager Work Email	Username	Job Name	Role vs Family
0	Betty	[' Department', 'Department', 'Department']	['Antarctica', 'Antarctica', 'Antarctica']	[10038253, 10038253, 10038253]	[nan, nan, nan]	[nan, nan, nan]	['betty@jane.com', 'betty@jane.com', 'betty@ja...	[nan, nan, nan]	['Do not match', 'Do not match', 'Do not match']
1	Bob	['Other Department', 'Other Department']	['Poland.', 'Poland']	[10036224, 10036224]	['Jane ', 'Jane ']	['jane@jane.com', 'jane@jane.com'']	['bob@jane.com', 'bob@jane.com']	[nan, nan]	['Do not match', 'Match']

Final data Frame would look like this


Name	Department	Business Unit	Person Number	Line Manager Name	Line Manager Work Email	Username	Job Name	Role vs Family
0	Betty	Department	Antarctica	10038253	NaN	NaN	betty@jane.com	NaN	Do not match
1	Bob	Other Department	Poland	10036224	Jane	jane@jane.com	bob@jane.com	NaN	['Do not match', 'Match']

我需要做以下操作:

1-删除每个列表中的所有重复值;

2-过滤数据框,检查列列表是否包含元素(例如,过滤所有具有“南极洲”的“业务单位”作为其元素中的至少一个);

3- 比较一个列列表中的元素是否存在于另一个列列表中(例如,如果“直线经理工作电子邮件”之一存在于“用户名”元素列表之一中)。

非常感谢您的支持!

标签: pythonpandaslistdataframe

解决方案


通过将列表转换为集合来删除所有列中的重复项的解决方案 - 所有值都没有缺失值:

def f(x):
     L = list(set(y for y in x if pd.notna(y)))
     #if empty list return NaN
     if len(L) == 0:
         return np.nan
     #if one element list return scalar
     elif len(L) == 1:
         return L[0]
     #else return full list
     else:
         return L

df.iloc[:, 1:] = df.iloc[:, 1:].applymap(f)
print (df)
    Name       Department BusinessUnit PersonNumber LineManagerName  \
0  Betty       Department   Antarctica     10038253             NaN   
1    Bob  OtherDepartment       Poland     10036224            Jane   

  LineManagerWorkEmail        Username JobName         RolevsFamily  
0                  NaN  betty@jane.com     NaN           Donotmatch  
1        jane@jane.com    bob@jane.com     NaN  [Donotmatch, Match]  

推荐阅读