首页 > 解决方案 > How does pandas replace NaN values with mean value using groupby

问题描述

I tried using this to replace the NaN values in the column feature count ( its an integer that ranges from 1 to 10 ) using groupby ( client_id or client _ name ) , however the NaN values do not seem to go.

df['feature_count'].isnull().sum()

The output is :

2254

Now I use:

df['feature_count'].fillna(df.groupby('client_name')['feature_count'].mean(), inplace=True)

But the output remains the same :

df['feature_count'].isnull().sum()

2254

Any other way to replace the NaN values by the means of other non NaN values of the column grouped by their IDs?

标签: pythonpandasgroup-bypandas-groupbyfillna

解决方案


df.groupby('client_name')['feature_count'].mean()返回一个系列。

但是您不希望将 null 值替换为 series。相反,您想用从系列映射的平均值替换空值。

因此,您可以使用以下内容:

s = df.groupby('client_name')['feature_count'].mean()
df['feature_count'].fillna(df['client_name'].map(s), inplace=True)

更 Pandorable 将是利用pd.DataFrame.transform,它为您处理映射部分:

s = df.groupby('client_name')['feature_count'].transform('mean')
df['feature_count'].fillna(s, inplace=True)

推荐阅读