首页 > 解决方案 > 为什么“df['legal_drinker'] = df.apply(majority)”和“df['legal_drinker'] = df['age'].apply(majority)”不同?

问题描述

我想根据某个列的值生成一个新列。我不知道为什么我会遇到这样的问题。

    csv_url = 'https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/04_Apply/Students_Alcohol_Consumption/student-mat.csv'
    df = pd.read_csv(csv_url)
    df.head()
    def majority():
    if df.age > 17:
        return True
    else:
        return False


    df['legal_drinker'] = df.apply(majority)
    df

然后它返回如下错误:

    TypeError                                 Traceback (most recent call last)
<ipython-input-28-0bb6e6e401fe> in <module>
      4     else:
      5         return False
----> 6 df['legal_drinker'] = df.apply(majority,axis =1)
      7 df

D:\Anaconda3\lib\site-packages\pandas\core\frame.py in apply(self, func, axis, raw, result_type, args, **kwds)
   6876             kwds=kwds,
   6877         )
-> 6878         return op.get_result()
   6879
   6880     def applymap(self, func) -> "DataFrame":

D:\Anaconda3\lib\site-packages\pandas\core\apply.py in get_result(self)
    184             return self.apply_raw()
    185
--> 186         return self.apply_standard()
    187
    188     def apply_empty_result(self):

D:\Anaconda3\lib\site-packages\pandas\core\apply.py in apply_standard(self)
    294             try:
    295                 result = libreduction.compute_reduction(
--> 296                     values, self.f, axis=self.axis, dummy=dummy, labels=labels
    297                 )
    298             except ValueError as err:

pandas\_libs\reduction.pyx in pandas._libs.reduction.compute_reduction()

pandas\_libs\reduction.pyx in pandas._libs.reduction.Reducer.get_result()

TypeError: majority() takes 0 positional arguments but 1 was given

但是使用以下代码,它可以正常工作。

    def majority(age):
    if age > 17:
        return True
    else:
        return False
    df['legal_drinker'] = df['age'].apply(majority)
    df

为什么?

标签: pythonpandasdataframe

解决方案


不同的调用上下文就是答案。

数据框上的行

  1. 创建列assign()
  2. axis=1apply()您逐行应用时
  3. 此上下文中的函数通过行。更新行并返回

系列

  1. 该函数是基于值的系列
  2. 返回所需的值

数据框中的系列

  1. 这绝对是错误的用例

结论 -apply()具有相同的名称,但其实现和使用取决于它所作用的对象。 系列应用 DataFrame 应用

csv_url = 'https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/04_Apply/Students_Alcohol_Consumption/student-mat.csv'
df = pd.read_csv(csv_url)
df.head()
dfc= df.copy()

# The parameter is a row, and it will be called once for each row in dataframe
def majorityrow(r):
    r.legal_drinker = r.age > 17
    return r
df.assign(legal_drinker=False).apply(majorityrow, axis=1)

# The parameter is a value, and it will be called for each item in a series

def majorityval(age):
    return age > 17

# Synonymous usages
df.assign(legal_drinker=df["age"].apply(majorityval))
df["legal_drinker"] = df["age"].apply(majorityval)
df

# Reset df
df = dfc

# The parameter is a series, it will be called once for each series in df
def majorityseries(s):
    if s.name=="legal_drinker":
        s=df.loc[:,"age"]>17
    return s

# Need to create column for the logic to work

df.assign(legal_drinker=False).apply(majorityseries)



推荐阅读