首页 > 解决方案 > 从函数修改熊猫数据框

问题描述

我发现自己试图一次又一次地用相同的操作修改几个数据帧。我想将所有修改放在一个函数中,然后使用数据框名称调用该函数并完成所有转换。

这是我现在尝试应用的代码和所有转换。当我运行它时,什么都没有发生,并且数据框保持原始状态。

#create a preprocessing formula so the process can be applied to any dataset (traning and validation and competition)
def preprocessing(df):
    #inspect dataframe
    df.head()

    #check data types in dataframe
    np.unique(df.dtypes).tolist()

    #inspect shape before removing duplicates
    df.shape

    #drop duplicates
    df = df.drop_duplicates()

    #inspect shape again to see change
    df.shape

    #calculate rows that have a mean of 100 to remove them later
    mean100_rows = [i for i in range(len(df)) if df.iloc[i,0:520].values.mean() == 100 ]

    #calculate columns that have a mean of 100 to remove them later
    mean100_cols = [i for i in np.arange(0,520,1) if df.iloc[:,i].values.mean() == 100 ]

    #calculate columns labels that have a mean of 100 to remove them later
    col_labels = [df.columns[i] for i in mean100_cols]

    #delete rows with mean 100
    df.drop(index = mean100_rows, axis=0, inplace=True)

    #delete columns with mean 100
    df.drop(columns=col_labels, axis=1, inplace=True)

    #export columns that have been removed
    pd.Series(col_labels).to_csv('remove_cols.csv')

    #head
    df.head()

    #check size again
    df.shape

标签: pythonpandasfunctiondataframe

解决方案


在 Python 中,对象通过引用传递给函数。

执行以下行时

df = df.drop_duplicates()

您基本上为函数参数分配了新的引用,但函数之外的对象并没有改变。

我建议更改函数,使其返回 df 对象,然后将其返回值分配给函数外部的 df 对象。


推荐阅读