python-3.x - Capping the outliers
问题描述
I have a data frame with 3 numerical variables for which I am trying to cap the outliers between 0.01 to 0.99 percentile but it's not working.
df[['TotalVisits', 'Total Time Spent on Website',
'Page Views Per Visit']].describe(percentiles=[.25, .5, .75, .90, .95, .99])
Now I tried to cap the outliers as this:
q_l = df['TotalVisits'].quantile(0.00)
q_h = df['TotalVisits'].quantile(0.99)
df['TotalVisits'][df['TotalVisits']<= q_l] = q_l
df['TotalVisits'][df['TotalVisits']>= q_h] = q_h
But the output remains same instead of max value changing to 17.
解决方案
You are doing the chain slice assign which will failed
Fix your code
q_l = df['TotalVisits'].quantile(0.00)
q_h = df['TotalVisits'].quantile(0.99)
df.loc[df['TotalVisits']<= q_l, 'TotalVisits'] = q_l
df.loc[df['TotalVisits']>= q_h], 'TotalVisits'] = q_h
And use pandas function improve it clip
df['TotalVisits'] = df['TotalVisits'].clip(lower = q_l, upper = q_h)
推荐阅读
- excel - 工作日的 split 函数产生 1 或 0
- java - 是否有正则表达式可以读取字符串/html 标记中的电子邮件地址?
- tensorflow - 谷歌云,ubuntu 错误:由于 EnvironmentError 无法安装软件包:[Errno 28] 设备上没有剩余空间
- reactjs - 如何使 React Material 网格响应?
- python - 使用 elif 语句时出现无效的语法错误(不是缩进错误)
- bash - 如何评估 Dockerfile 中的环境变量?
- dbeaver - dbeaver redshift 为 pg_catalog 模式返回错误数据
- c# - 字符串匹配正则表达式模式并替换为匹配的问题
- rust - 有没有办法覆盖 Rust 类型的赋值运算符?
- python - Python Selenium 无法提取内部文本