python - Filter anomalous and complex datasets
问题描述
I have a question about how to filter and select anomalous datasets from a large df. For example, I have a df:
import pandas as pd
import numpy as np
data = {"code": ['a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'd', 'd'],
"number": [7, 5, 2, 4, 6, 9, 6, 2, 8, 2]}
df = pd.DataFrame(data=data)
code number
0 a 7
1 a 5
2 a 2
3 b 4
4 b 6
5 c 9
6 c 6
7 c 2
8 d 8
9 d 2
In this df, most of data follow a rule that in a same 'code' group, a larger number appears in the beginning. For example, in 'a' group, its values in dataframe follows: 7>5>2; in 'c' group, its value follows: 9>6>2, same pattern in 'd' group 8 > 2. But only not in 'b' group as a smaller value 4 arranges before than 6. So I wish to filter the anomalous subset b only and have an output like:
code number
0 b 4
1 b 6
Would anyone have any ideas? Much appreciate for help.
解决方案
我们可以这样filter
做diff
df.groupby('code').filter(lambda x : (x.number.diff()>0).any())
code number
3 b 4
4 b 6
推荐阅读
- npm - 使用 React 的 NPM 问题
- dialogflow-es - GCP 中 Stackdriver 中的 DialogFlow 日志没有 json_payload
- python - Anaconda Python - 从 .py 创建 .exe 文件时遇到问题
- c++ - 如何将 cmake 库添加到 qmake 项目
- keras - 在 keras 中训练一维卷积模型
- components - 在 svelte 中重新渲染子组件
- html - 当圆形按钮和下拉菜单被间隙分开时,保留下拉悬停
- javascript - 我的 javascript 函数跳过了 jquery 代码,是因为我必须将 jquery 安装到 Visual Studio 代码中吗?
- c - C - 使用枚举向用户发送错误消息?
- node.js - 使用 Node JS 构建 Zoom Clone(点对点连接)