python - split by | and find unique values in pandas series
问题描述
i have a movie data from movielens dataset and i would like to select unique genre from the genres columns. this is the dataset
the result would look like this
can somebody help me to split and select unique genre from the genres columns?
Thanks
解决方案
Solution:
pd.unique(df["genres"].str.split("|", expand=True).stack())
Output:
array(['Adventure', 'Animation', 'Children', 'Fantasy',
'Horror','Action','Thriller'], dtype=object)
Explanations:
This part splits the genres of the column genres
in one column per genre (the output is an extract):
df["genres"].str.split("|", expand=True)
0 1 2
0 Adventure Animation Children
1 Adventure Children Fantasy
2 Comedy None None
.stack()
stacks all the columns into one:
df["genres"].str.split("|", expand=True).stack()
0 Adventure
1 Animation
2 Children
3 Comedy
4 Fantasy
Then, pd.unique()
returns an array containing the uniques values of the Serie.
推荐阅读
- postgresql - 分组为时间间隔postgres
- python - 无法使用python获取jpg图像
- java - WebLogic:通过 WLST 添加新的自定义身份验证提供程序会引发 ClassNotFoundException
- mysql - Uniqe 1 列和他的属性
- numbers - 图编号;小节内的节编号
- javascript - WebDriverIO 处的 Skip 和其他参数
- r - 向量中循环数据的完整性检查
- r - 多个变量的条件频率计算
- visual-studio - Flutter/Dart - 在 null 上调用了方法“setStringList”
- reactjs -
更改 URL 但仅在刷新后加载组件