python - 带有 ffill 的 Pandas fillna 增加了噪音
问题描述
我正在尝试从 pandas DataFrame 中的列中删除异常值。
这是我的变量最初的样子(带有明显的异常值):
然后我决定删除任何有 +/-3 变化的东西(因为我知道不可能有那么多变化):
这有效,并给了我 NaN 来替换尖峰:
但是每当我尝试用之前的观察来替换现在缺失的值时,我都会以某种方式得到一些峰值!
有谁知道我做错了什么?
这是整个代码(在一个无限期的while循环中):
df = pd.DataFrame({'soc': [38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 127.0, 127.0, 66.48, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 127.0, 55.8, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0]})
while (abs(df['soc'].diff()) > 3).any():
df['soc'] = np.where(abs(df['soc'].diff()) > 3, np.nan, df['soc'])
df['soc'].fillna(method='ffill', inplace=True)
解决方案
I believe you are not deleting the values with a deviation of more than 3, because in the second plot, I can still the a dot that shouldn't show up. Maybe you are assigning in the wrong column too. This is a generic example of what you intend to do that is working:
df = pd.DataFrame({'A':[100,110,105,104,103,102,101]})
df['A'] = np.where(abs(df['A'].diff()) > 3,np.nan,df['A'])
df['A'] = df['A'].fillna(method='ffill')
In this example, 110 and 105 should be removed since they have a deviation of more than 3 between each other, and they will be replaced with 100. The output:
A
0 100.0
1 100.0
2 100.0
3 104.0
4 103.0
5 102.0
6 101.0
推荐阅读
- apache-kafka - kafka + 查询所有Topics的详细配置
- reactjs - 如何在反应中处理组件之间的道具
- ruby-on-rails - 全局模拟调用外部 API 的方法
- php - 将 WhereHas 与可能具有或不具有链接关系的模型一起使用
- python - IndexError: 读取 LUNA16 时列表索引超出范围
- java - 在Java的主类中放置一个类可以吗?
- java - 我正在尝试使用 springboot 在三个实体中使用一对多映射
- python - 我可以在没有代理的情况下从 main.py 在 Peewee 上初始化多数据库吗
- sql - QUERY TYPE 的雪花计算成本
- task - 如何知道哪个任务占用了二进制信号量?