首页 > 解决方案 > 每种类型的电影镜头收视率分布

问题描述

我试图在一个情节中绘制每种电影类型的两种性别的平均评分。

我的dataset样子是这样的:

      item_id                       title release_date  video_release_date  \
0            1            Toy Story (1995)  01-Jan-1995                 NaN   
1            4           Get Shorty (1995)  01-Jan-1995                 NaN   

...        ...                         ...          ...                 ...   
99995      748           Saint, The (1997)  14-Mar-1997                 NaN   
99996      751  Tomorrow Never Dies (1997)  01-Jan-1997                 NaN   

                                                imdb_url  unknown  Action  \
0      http://us.imdb.com/M/title-exact?Toy%20Story%2...        0       0   
1      http://us.imdb.com/M/title-exact?Get%20Shorty%...        0       1   

...                                                  ...      ...     ...   
99995  http://us.imdb.com/M/title-exact?Saint%2C%20Th...        0       1   
99996  http://us.imdb.com/M/title-exact?imdb-title-12...        0       1   

       Adventure  Animation  Childrens  ...  War  Western  user_id  rating  \
0              0          1          1  ...    0        0      308       4   
1              0          0          0  ...    0        0      308       5   

编码:

labels = ['Action', 'Adventure' , 'Animation' , 'Childrens' , 'Comedy' , 'Crime' , 'Documentary' , 'Drama' , 'Fantasy' , 'Film-Noir' , 'Horror' , 'Musical' , 'Mystery' , 'Romance' , 'Sci-Fi' , 'Thriller' , 'War' , 'Western']
male_values = all_male_users.iloc[:, 6:26]
female_values = all_female_users.iloc[:, 6:26]

x = np.arange(len(labels))  # the label locations
width = 0.35  # the width of the bars

fig, ax = plt.subplots(figsize=(15,7))
rects1 = ax.bar(x - width/2, male_values.rating.mean(), width, label='Male')
rects2 = ax.bar(x + width/2, female_values.rating.mean(), width, label='Female')

# Add some text for labels, title and custom x-axis tick labels, etc.
ax.set_ylabel('Scores')
ax.set_title('Most preferred movie genres', fontsize=14)
ax.set_xticks(x)
ax.set_xticklabels(labels)
ax.legend()

fig.tight_layout()
plt.show()

到目前为止,它绘制了每个性别的总体平均得分,而不是每个电影类型的平均得分。 在此处输入图像描述

标签: pythonpandasmatplotlib

解决方案


为了重现您的示例,我需要创建一个具有随机值的示例数据框(男性和女性分别为 1,000):

import numpy as np
import matplotlib.pyplot as plt

# create sample data
cols = ['Action', 'Adventure' , 'Animation' , 'Childrens' , 'Comedy' , 'Crime' , 'Documentary' , 'Drama' , 'Fantasy' , 'Film-Noir' , 'Horror' , 'Musical' , 'Mystery' , 'Romance' , 'Sci-Fi' , 'Thriller' , 'War' , 'Western', 'rating']
male_values = pd.DataFrame(columns = cols)
female_values = pd.DataFrame(columns = cols)

# define parameters for randomly recreated the dataframe
arr_dummy_genre = np.zeros(18, dtype = int)
arr_dummy_genre[0] = 1
range_rating = range(1,6)

# generate 1,000 random values
for i in range(1000):
    random_rating = float(np.random.choice(range_rating))
    random_genre = np.random.permutation(arr_dummy_genre)
    random_row = np.append(random_genre, random_rating)
    random_row
    male_values.loc[len(male_values)] = random_row

    random_rating = float(np.random.choice(range_rating))
    random_genre = np.random.permutation(arr_dummy_genre)
    random_row = np.append(random_genre, random_rating)
    random_row
    female_values.loc[len(female_values)] = random_row

此时,女性和男性数据框仅包含 1000 个针对流派和评级的观察。您的数据具有不同的形状,但这对于本示例来说不是问题。

接下来的步骤准备数据以呈现您想要的方式,取消代表流派的虚拟变量并按流派分组:

    # reconstruct the dummified genre of the movie
    female_values['genre'] = pd.Series(female_values[labels].columns[np.where(female_values[labels]!=0)[1]])
    male_values['genre'] = pd.Series(male_values[labels].columns[np.where(male_values[labels]!=0)[1]])

    # group by genre
    gr_male_values = male_values.groupby('genre')['rating'].mean()
    gr_female_values = female_values.groupby('genre')['rating'].mean()

现在,使用您的同一段代码,只需更改分组数据,您就可以按照您想要的方式绘制:

labels = ['Action', 'Adventure' , 'Animation' , 'Childrens' , 'Comedy' , 'Crime' , 'Documentary' , 'Drama' , 'Fantasy' , 'Film-Noir' , 'Horror' , 'Musical' , 'Mystery' , 'Romance' , 'Sci-Fi' , 'Thriller' , 'War' , 'Western']

x = np.arange(len(labels))  # the label locations
width = 0.35  # the width of the bars

fig, ax = plt.subplots(figsize=(15,7))
rects1 = ax.bar(x - width/2, gr_male_values, width, label='Male')
rects2 = ax.bar(x + width/2, gr_female_values, width, label='Female')

# Add some text for labels, title and custom x-axis tick labels, etc.
ax.set_ylabel('Scores')
ax.set_title('Most preferred movie genres', fontsize=14)
ax.set_xticks(x)
ax.set_xticklabels(labels)
ax.legend()

fig.tight_layout()
plt.show()

生成以下情节,在我的情况下完全随机:

在此处输入图像描述


推荐阅读