python - 如何在熊猫中使用 groupby 按 bin 对数据进行排序?
问题描述
问题:如何在 pandas 中使用 groupby 按 bin 对数据进行排序?
我想要的是以下内容:
release_year listed_in
1920 Documentaries
1930 TV Shows
1940 TV Shows
1950 Classic Movies, Documentaries
1960 Documentaries
1970 Classic Movies, Documentaries
1980 Classic Movies, Documentaries
1990 Classic Movies, Documentaries
2000 Classic Movies, Documentaries
2010 Children & Family Movies, Classic Movies, Comedies
2020 Classic Movies, Dramas
为了达到这个结果,我尝试了以下公式:
bins = [1925,1950,1960,1970,1990,2000,2010,2020]
groups = df.groupby(['listed_in', pd.cut(df.release_year, bins)])
groups.size().unstack()
它显示以下结果:
release_year (1925,1950] (1950,1960] (1960,1970] (1970,1990] (1990,2000] (2000,2010] (2010, 2020]
listed_in
Action & Adventure 0 0 0 0 9 16 43
Action & Adventure, Anime Features, Children & Family Movies 0 0 0 0 0 0 1
Action & Adventure, Anime Features, Classic Movies 0 0 0 1 0 0 0
...
461 rows x 7 columns
我还尝试了以下公式:
df['release_year'] = df['release_year'].astype(str).str[0:2] + '0'
df.groupby('release_year')['listed_in'].apply(lambda x: x.mode().iloc[0])
结果如下:
release_year
190 Dramas
200 Documentaries
Name: listed_in, dtype:object
这是数据集的示例:
import pandas as pd
df = pd.DataFrame({
'show_id':['81145628','80117401','70234439'],
'type':['Movie','Movie','TV Show'],
'title':['Norm of the North: King Sized Adventure',
'Jandino: Whatever it Takes',
'Transformers Prime'],
'director':['Richard Finn, Tim Maltby',NaN,NaN],
'cast':['Alan Marriott, Andrew Toth, Brian Dobson',
'Jandino Asporaat','Peter Cullen, Sumalee Montano, Frank Welker'],
'country':['United States, India, South Korea, China',
'United Kingdom','United States'],
'date_added':['September 9, 2019',
'September 9, 2016',
'September 8, 2018'],
'release_year':['2019','2016','2013'],
'rating':['TV-PG','TV-MA','TV-Y7-FV'],
'duration':['90 min','94 min','1 Season'],
'listed_in':['Children & Family Movies, Comedies',
'Stand-Up Comedy','Kids TV'],
'description':['Before planning an awesome wedding for his',
'Jandino Asporaat riffs on the challenges of ra',
'With the help of three human allies, the Autob']})
解决方案
最简单的方法是使用代码的第一部分并简单地制作 a 的最后release_year
一位0
。然后你可以.groupby
几十年并获得每个十年最流行的流派,即mode
:
输入:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'show_id':['81145628','80117401','70234439'],
'type':['Movie','Movie','TV Show'],
'title':['Norm of the North: King Sized Adventure',
'Jandino: Whatever it Takes',
'Transformers Prime'],
'director':['Richard Finn, Tim Maltby',np.nan,np.nan],
'cast':['Alan Marriott, Andrew Toth, Brian Dobson',
'Jandino Asporaat','Peter Cullen, Sumalee Montano, Frank Welker'],
'country':['United States, India, South Korea, China',
'United Kingdom','United States'],
'date_added':['September 9, 2019',
'September 9, 2016',
'September 8, 2018'],
'release_year':['2019','2016','2013'],
'rating':['TV-PG','TV-MA','TV-Y7-FV'],
'duration':['90 min','94 min','1 Season'],
'listed_in':['Children & Family Movies, Comedies',
'Stand-Up Comedy','Kids TV'],
'description':['Before planning an awesome wedding for his',
'Jandino Asporaat riffs on the challenges of ra',
'With the help of three human allies, the Autob']})
代码:
df['release_year'] = df['release_year'].astype(str).str[0:3] + '0'
df = df.groupby('release_year', as_index=False)['listed_in'].apply(lambda x: x.mode().iloc[0])
df
输出:
release_year listed_in
0 2010 Children & Family Movies, Comedies
推荐阅读
- android - Android 平台上 VisionKit VNDocumentCameraViewController 的替代方案
- php - 路由到动态子域不起作用 laravel 8
- testing - 排毒不运行测试(Android)
- python - Numpy 矩阵乘法,但不是将其乘以 XOR 的元素
- c++ - 如何在我的设备上/从我的设备上保存/加载信息?
- javascript - 通过一个函数中的多个比较对对象数组进行排序
- flutter - 谁能帮我解决这个问题?我是新来的颤振和学习发展我的兴趣请帮助我
- python - 如何在mac上设置python3版本?
- java - 用于表示 JSON 对象的通用 pojo
- reactjs - 使用 React Hooks 时如何替换 this.setState?