首页 > 解决方案 > Capping the outliers

问题描述

I have a data frame with 3 numerical variables for which I am trying to cap the outliers between 0.01 to 0.99 percentile but it's not working.

df[['TotalVisits', 'Total Time Spent on Website', 
'Page Views Per Visit']].describe(percentiles=[.25, .5, .75, .90, .95, .99])

This is the output

Now I tried to cap the outliers as this:

q_l = df['TotalVisits'].quantile(0.00)
q_h = df['TotalVisits'].quantile(0.99)

df['TotalVisits'][df['TotalVisits']<= q_l] = q_l
df['TotalVisits'][df['TotalVisits']>= q_h] = q_h

But the output remains same instead of max value changing to 17.

标签: python-3.xpandasoutliers

解决方案


You are doing the chain slice assign which will failed

Fix your code

q_l = df['TotalVisits'].quantile(0.00)
q_h = df['TotalVisits'].quantile(0.99)

df.loc[df['TotalVisits']<= q_l, 'TotalVisits'] = q_l
df.loc[df['TotalVisits']>= q_h], 'TotalVisits'] = q_h

And use pandas function improve it clip

df['TotalVisits'] = df['TotalVisits'].clip(lower = q_l, upper = q_h)

推荐阅读