python - 每种类型的电影镜头收视率分布
问题描述
我试图在一个情节中绘制每种电影类型的两种性别的平均评分。
我的dataset
样子是这样的:
item_id title release_date video_release_date \
0 1 Toy Story (1995) 01-Jan-1995 NaN
1 4 Get Shorty (1995) 01-Jan-1995 NaN
... ... ... ... ...
99995 748 Saint, The (1997) 14-Mar-1997 NaN
99996 751 Tomorrow Never Dies (1997) 01-Jan-1997 NaN
imdb_url unknown Action \
0 http://us.imdb.com/M/title-exact?Toy%20Story%2... 0 0
1 http://us.imdb.com/M/title-exact?Get%20Shorty%... 0 1
... ... ... ...
99995 http://us.imdb.com/M/title-exact?Saint%2C%20Th... 0 1
99996 http://us.imdb.com/M/title-exact?imdb-title-12... 0 1
Adventure Animation Childrens ... War Western user_id rating \
0 0 1 1 ... 0 0 308 4
1 0 0 0 ... 0 0 308 5
编码:
labels = ['Action', 'Adventure' , 'Animation' , 'Childrens' , 'Comedy' , 'Crime' , 'Documentary' , 'Drama' , 'Fantasy' , 'Film-Noir' , 'Horror' , 'Musical' , 'Mystery' , 'Romance' , 'Sci-Fi' , 'Thriller' , 'War' , 'Western']
male_values = all_male_users.iloc[:, 6:26]
female_values = all_female_users.iloc[:, 6:26]
x = np.arange(len(labels)) # the label locations
width = 0.35 # the width of the bars
fig, ax = plt.subplots(figsize=(15,7))
rects1 = ax.bar(x - width/2, male_values.rating.mean(), width, label='Male')
rects2 = ax.bar(x + width/2, female_values.rating.mean(), width, label='Female')
# Add some text for labels, title and custom x-axis tick labels, etc.
ax.set_ylabel('Scores')
ax.set_title('Most preferred movie genres', fontsize=14)
ax.set_xticks(x)
ax.set_xticklabels(labels)
ax.legend()
fig.tight_layout()
plt.show()
解决方案
为了重现您的示例,我需要创建一个具有随机值的示例数据框(男性和女性分别为 1,000):
import numpy as np
import matplotlib.pyplot as plt
# create sample data
cols = ['Action', 'Adventure' , 'Animation' , 'Childrens' , 'Comedy' , 'Crime' , 'Documentary' , 'Drama' , 'Fantasy' , 'Film-Noir' , 'Horror' , 'Musical' , 'Mystery' , 'Romance' , 'Sci-Fi' , 'Thriller' , 'War' , 'Western', 'rating']
male_values = pd.DataFrame(columns = cols)
female_values = pd.DataFrame(columns = cols)
# define parameters for randomly recreated the dataframe
arr_dummy_genre = np.zeros(18, dtype = int)
arr_dummy_genre[0] = 1
range_rating = range(1,6)
# generate 1,000 random values
for i in range(1000):
random_rating = float(np.random.choice(range_rating))
random_genre = np.random.permutation(arr_dummy_genre)
random_row = np.append(random_genre, random_rating)
random_row
male_values.loc[len(male_values)] = random_row
random_rating = float(np.random.choice(range_rating))
random_genre = np.random.permutation(arr_dummy_genre)
random_row = np.append(random_genre, random_rating)
random_row
female_values.loc[len(female_values)] = random_row
此时,女性和男性数据框仅包含 1000 个针对流派和评级的观察。您的数据具有不同的形状,但这对于本示例来说不是问题。
接下来的步骤准备数据以呈现您想要的方式,取消代表流派的虚拟变量并按流派分组:
# reconstruct the dummified genre of the movie
female_values['genre'] = pd.Series(female_values[labels].columns[np.where(female_values[labels]!=0)[1]])
male_values['genre'] = pd.Series(male_values[labels].columns[np.where(male_values[labels]!=0)[1]])
# group by genre
gr_male_values = male_values.groupby('genre')['rating'].mean()
gr_female_values = female_values.groupby('genre')['rating'].mean()
现在,使用您的同一段代码,只需更改分组数据,您就可以按照您想要的方式绘制:
labels = ['Action', 'Adventure' , 'Animation' , 'Childrens' , 'Comedy' , 'Crime' , 'Documentary' , 'Drama' , 'Fantasy' , 'Film-Noir' , 'Horror' , 'Musical' , 'Mystery' , 'Romance' , 'Sci-Fi' , 'Thriller' , 'War' , 'Western']
x = np.arange(len(labels)) # the label locations
width = 0.35 # the width of the bars
fig, ax = plt.subplots(figsize=(15,7))
rects1 = ax.bar(x - width/2, gr_male_values, width, label='Male')
rects2 = ax.bar(x + width/2, gr_female_values, width, label='Female')
# Add some text for labels, title and custom x-axis tick labels, etc.
ax.set_ylabel('Scores')
ax.set_title('Most preferred movie genres', fontsize=14)
ax.set_xticks(x)
ax.set_xticklabels(labels)
ax.legend()
fig.tight_layout()
plt.show()
生成以下情节,在我的情况下完全随机:
推荐阅读
- python - 字符串格式化后找不到 os.system 命令
- api - 如何在自托管 Web API 中接受客户端证书
- javascript - 属性更新后更新用户上下文 - AWS Amplify
- javascript - Jquery UI Datepicker Timepicker滑块在拖动时不起作用
- regex - Oracle REGEXP_SUBSTR 前瞻和后瞻
- python - 为什么`max()`不会使用参数中的函数更改的`nonlocal` var?
- c# - 如何使用 Npgsql 管理 SQLserver 到 PostgreSQL 的迁移?
- optimization - 如果增加决策变量的界限,为什么问题变得不可行?
- amazon-web-services - 如何了解授予用户的所有 AWS IAM 权限
- php - 用户输入值与数据库数组值未使用 JSON 进行验证