python-3.x - 从具有值列表的列中检索唯一值
问题描述
我有一个 df 有一列,其中的值是值列表。
我的意图是使用这里的一些技术拆分此列: Pandas split column of lists into multiple columns
但是,对于列名,我想使用这些值列表中的每个唯一值。
为了检索唯一值,我尝试了三种不同的方法。每个人都因不同的原因而失败。
当值是值列表时,有没有办法获取 Series.unique() ?
我的三个尝试,以及相关的回溯:
1)
unique_vals = splitted_interests.unique()
Traceback (most recent call last):
File "C:/Users/Mark/PycharmProjects/main/main.py", line 68, in <module>
unique_vals = splitted_interests.unique()
File "C:\Users\Mark\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\series.py", line 1991, in unique
result = super().unique()
File "C:\Users\Mark\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\base.py", line 1405, in unique
result = unique1d(values)
File "C:\Users\Mark\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\algorithms.py", line 405, in unique
uniques = table.unique(values)
File "pandas/_libs/hashtable_class_helper.pxi", line 1767, in pandas._libs.hashtable.PyObjectHashTable.unique
File "pandas/_libs/hashtable_class_helper.pxi", line 1718, in pandas._libs.hashtable.PyObjectHashTable._unique
TypeError: unhashable type: 'list'
2)
unique_vals = splitted_interests.apply(lambda x: x.unique())
Traceback (most recent call last):
File "C:/Users/Mark/PycharmProjects/main/main.py", line 68, in <module>
unique_vals = splitted_interests.apply(lambda x: x.unique())
File "C:\Users\Mark\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\series.py", line 4045, in apply
mapped = lib.map_infer(values, f, convert=convert_dtype)
File "pandas/_libs/lib.pyx", line 2228, in pandas._libs.lib.map_infer
File "C:/Users/Mark/PycharmProjects/main/main.py", line 68, in <lambda>
unique_vals = splitted_interests.apply(lambda x: x.unique())
AttributeError: 'list' object has no attribute 'unique'
3)
unique_vals = splitted_interests.apply(lambda x: [y.unique() for y in x])
Traceback (most recent call last):
File "C:\Users\Mark\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\series.py", line 4045, in apply
mapped = lib.map_infer(values, f, convert=convert_dtype)
File "pandas/_libs/lib.pyx", line 2228, in pandas._libs.lib.map_infer
File "C:/Users/Mark/PycharmProjects/main/main.py", line 68, in <lambda>
unique_vals = splitted_interests.apply(lambda x: [y.unique() for y in x])
File "C:/Users/Mark/PycharmProjects/main/main.py", line 68, in <listcomp>
unique_vals = splitted_interests.apply(lambda x: [y.unique() for y in x])
AttributeError: 'str' object has no attribute 'unique'
解决方案
对于相同的排序创建字典和提取keys
,解决方案在python 3.6+中工作:
df = pd.DataFrame({'JobRoleInterest':['aa,ss,ss','dd,ff','k,dd,dd,dd', 'j,gg']})
splitted_interests = df['JobRoleInterest'].str.split(',')
unique_vals = list(dict.fromkeys([y for x in splitted_interests for y in x]).keys())
print (unique_vals)
['aa', 'ss', 'dd', 'ff', 'k', 'j', 'gg']
推荐阅读
- java - 从 Java 中的泛型约束接口继承
- networking - 如何收集 k8s 集群 pod 的所有 igress 和 egress 流量?
- netlogo - 为什么在 IF 语句中使用 map 会出现问题?
- vue.js - 在 vuejs 中将 Dashbord 和 Forum 组件与 App 组件分开
- android - ListView 的自定义适配器忽略设置值
- angular - Angular 材料中的中心对齐内容不起作用
- python - 如何使用 getattr 访问 python 对象中的列表元素?
- python - 我需要从 URL 下载图像并保存到 python 3.7 的文件夹中
- kubernetes - 如何编写 CI/CD 管道在 Google Kubernetes 集群上运行 Java 微服务的集成测试?
- c - 如果在 main() 之后定义的函数中没有返回,为什么 C 中不需要函数原型?