首页 > 解决方案 > Getting unique values from a column with varying values in pandas and breaking up rows into multiple rows on condition

问题描述

Here is the example of the DataFrame:

df_movies['genres'].unique()
array(['Action|Adventure|Science Fiction|Thriller',
       'Adventure|Science Fiction|Thriller',
       'Action|Adventure|Science Fiction|Fantasy', ...,
       'Adventure|Drama|Action|Family|Foreign',
       'Comedy|Family|Mystery|Romance',
       'Mystery|Science Fiction|Thriller|Drama'], dtype=object)

When I try

df_movies[df_movies['genres'].str.contains('|')]

this gives just lists all rows including the ones with just one category for genre like "Horror", "Documentary" etc.

How do get all unique values from this column? And also what is a way to break up each row into multiple so each row has only one genre associated with it?

标签: pythonpandas

解决方案


|是一个特殊字符。使用包含它将用于连接多个条件。例如Series.str.contains('foo|seven')与询问每一行的值相同(调用它x):'foo' in x or 'seven' in x

鉴于此,您的查询被解释为'' in x or '' in x,这将True适用于所有行,因为空字符串在所有 python 字符串中。要从字面上使用'|'您需要转义的字符'\'

df = pd.DataFrame({'genres': ['foo|bar', 'no_bar_here']})

df['genres'].str.contains('\|')
0     True
1    False
Name: genres, dtype: bool

推荐阅读