首页 > 解决方案 > pandas - 使用系列问题(应该很容易)

问题描述

你好 stackoverflow 的好人。我目前正在尝试回答以下问题,我有正确的答案(Nolan 和 7),但我的答案不是系列格式,我不知道如何去做,有人可以帮忙吗?

我已经将前几个问题作为问题的上下文。

import pandas as pd

xls = pd.ExcelFile('imdb.xlsx')
df = xls.parse('imdb')
df_directors = xls.parse('directors')
df_countries = xls.parse('countries')

print("Data Loading Finished.")

""" Q1: 
Join three Dataframes: df, df_directors, and df_countries with an inner join.
Store the joined DataFrames in df.

Here are the steps:
1. Merge df with df_countries and assign it df
2. Merge df with df_directors and assign it to df again
There might be errors if the merge is not in this order, so please be careful.

"""

# your code here
df.head()
df = pd.merge(left=df, right=df_countries, how='inner', left_on='country_id', right_on='id')
df.head()
df = pd.merge(left=df, right=df_directors, how='inner', left_on='director_id', right_on='id')


# After the join, the resulting Dataframe should have 12 columns.
df.shape

""" Q4:
Who is the director with the most movies? First get the number of movies per "director_name", then save the director's name
and count as a series of length 1 called "director_with_most"
"""

# your code here

directors = df['director_name'].value_counts()
print(directors)

director_with_most = directors[]
directors.index[0]
directors[0]

print(director_with_most)

director.index 给出了 Nolan 的结果,directors[0] 给出了他在数据库中出现的次数: 7. 当我检查我的答案时得到的错误(这是来自 coursera 课程)是:

AssertionError: Series Expected type <class 'pandas.core.series.Series'>, found <class 'list'> instead

请帮助,我已经坚持了很长时间。

干杯,

亚当

标签: pythonpandasseries

解决方案


向 SO 发布此类问题是一种糟糕的形式,而且,没有理由说明 director[0] 的人数应该最多,所以你离解决方案还很远。

但是,我真的很讨厌作业是如何制定的。长度为 1 的序列包含两个值到底意味着什么?愚蠢的。做这个:

director_with_most = df.director_name.value_counts().loc[lambda x: x == x.max()]

(如果 max 不是唯一的,这将返回多行)


推荐阅读