首页 > 解决方案 > split by | and find unique values in pandas series

问题描述

i have a movie data from movielens dataset and i would like to select unique genre from the genres columns. this is the dataset

movies dataset

the result would look like this

result

can somebody help me to split and select unique genre from the genres columns?

Thanks

标签: pythonpandas

解决方案


Solution:

pd.unique(df["genres"].str.split("|", expand=True).stack())

Output:

array(['Adventure', 'Animation', 'Children', 'Fantasy',
       'Horror','Action','Thriller'], dtype=object)

Explanations:

This part splits the genres of the column genres in one column per genre (the output is an extract):

df["genres"].str.split("|", expand=True)

    0           1           2       
0   Adventure   Animation   Children
1   Adventure   Children    Fantasy
2   Comedy      None        None 

.stack() stacks all the columns into one:

df["genres"].str.split("|", expand=True).stack()

0    Adventure
1    Animation
2     Children
3       Comedy
4      Fantasy

Then, pd.unique() returns an array containing the uniques values of the Serie.


推荐阅读