python - Outlier removal on a variable with several rows contain NAN (I need to keep the NAN and the position of the NAN also matters)
问题描述
I need to remove outliers from a variable which contains several NANs in it. It looks like this:
X-velocity
1 0.0345
2 0.0222
3 0.0034
4 0.5604
5 0.4326
6 NaN
7 0.0333
8 0.3635
9 0.3345
10 0.3468
11 0.4573
12 0.7985
13 0.9359
14 NAN
15 0.4635
16 0.6857
17 0.4239
18 NAN
19 0.3849
20 0.3726
21 0.4637
22 0.3647
23 NAN
24 0.2938
25 0.5227
I need to remove the outlier from the variable without deleting or changing the value or position of the NAN. I don't mean NAN is the outlier, I mean the outlier of the continuous numbers. for example, I want to remove all the numbers that out of the range of mean +/- 3 * standard deviation. When I'm doing the outlier detection and removing, I don't want to affect the NANs, I want them to be there (since I need to perform other operation based on the NANs later).
Is there any possible way to do it? I appreciat any help.
解决方案
If you have a method to determine whether something is an outlier or not (I imagine you have some threshold) you can create a new column that stores this flag.
For example:
# [True or False] is this more than 3 standard deviations away from the mean
df['is_outlier'] = abs(df['X-velocity'] - (df['X-velocity'].mean())/df['X-velocity'].std() > 3
You can then select values by using both this outlier flag OR whether the value is null:
# Select rows that contain non-outliers or null values
filtered = df[(~df.is_outlier) | df['X-velocity'].isnull()]
推荐阅读
- python-3.x - 运行 model_main_tf2.py 和 exporter_main_v2.py 后模型没有导出。不知道为什么?
- r - Shiny中的表格输出问题
- git - 如果我在提交后进行拉取,为什么 SourceTree 中的单个提交的 git 推送通知数量会增加
- spring-boot - 有什么方法可以确定 kuberbetes 服务发现中的当前服务 instance_id 吗?
- python - 如何总结数字的位数?
- google-cloud-platform - GCP Cloud DNS 无法解析 Namecheap dns
- sqlite - 如何在 xamarin 中通过完整链接使用 Microsoft.EntityFrameworkCore.Sqlite?(链接用户程序集)
- ldap - 有没有办法使用 python-ldap 远程绑定到 cn=config ?
- typescript - 有没有办法在打字稿中描述一种“keyfor”?
- php - PHP在带有URL的字符串中添加空格