python - Getting unique values from a column with varying values in pandas and breaking up rows into multiple rows on condition
问题描述
Here is the example of the DataFrame:
df_movies['genres'].unique()
array(['Action|Adventure|Science Fiction|Thriller',
'Adventure|Science Fiction|Thriller',
'Action|Adventure|Science Fiction|Fantasy', ...,
'Adventure|Drama|Action|Family|Foreign',
'Comedy|Family|Mystery|Romance',
'Mystery|Science Fiction|Thriller|Drama'], dtype=object)
When I try
df_movies[df_movies['genres'].str.contains('|')]
this gives just lists all rows including the ones with just one category for genre like "Horror", "Documentary" etc.
How do get all unique values from this column? And also what is a way to break up each row into multiple so each row has only one genre associated with it?
解决方案
|
是一个特殊字符。使用包含它将用于连接多个条件。例如Series.str.contains('foo|seven')
与询问每一行的值相同(调用它x
):'foo' in x or 'seven' in x
鉴于此,您的查询被解释为'' in x or '' in x
,这将True
适用于所有行,因为空字符串在所有 python 字符串中。要从字面上使用'|'
您需要转义的字符'\'
df = pd.DataFrame({'genres': ['foo|bar', 'no_bar_here']})
df['genres'].str.contains('\|')
0 True
1 False
Name: genres, dtype: bool
推荐阅读
- spring-security - 安全考虑:struts 1.x 的 spring-struts 使用
- python-3.x - 如何使用返回整数值编写多个 If-Else
- javascript - async/await 函数中的 JavaScript Promise 解析最终响应数组
- asp.net-core - 如何在 Linux Ubuntu OS (asp.net Core 2.1) 中使用 Rotativa.aspnetcore 包
- ios - iOS 录屏检测
- autodesk-forge - 2D 平面图未在查看器中正确加载
- eclipse - 无法使用 RCPTT 启动 AUT
- android - 谷歌依赖无法在 Android Studio Gradle 中解决
- javascript - Laravel:为什么按下 ctrl+shift+t 后网页上显示 json?
- firebase - 将 Cloud Firestore 限制为特定域