首页 > 解决方案 > 在一些条件下,将行与前一行进行比较,并在 python pandas 中删除带有条件的行

问题描述

我对我需要做什么有一个概念,但是我无法编写正确的代码来运行,请看一下并提供一些建议。

步骤 1. 在第二列中查找包含值的行

步骤 2. 对于这些行,将第一列中的值与其前一行进行比较

步骤 3. 删除第一列值较大的行

|missing | diff |
|--------|------|
| 0      | nan  |
| 1      | 60   |
| 1      | nan  |
| 0      | nan  |
| 0      | nan  |
| 1      | 180  |
| 1      | nan  |
| 0      | 120  |

例如。我想将缺失值与 diff [120,180,60] 中的行值及其之前的行进行比较。最后,期望的数据框看起来像

|missing | diff |
|--------|------|
| 0      | nan  |
| 1      | nan  |
| 0      | nan  |
| 0      | nan  |
| 0      | 120  |

根据答案更新问题,得到与原始df相同的df

import pandas as pd
import numpy as np
data={'missing':[0,1,1,0,0,1,1,0],'diff':[np.nan,60,np.nan,np.nan,np.nan,180,np.nan,120]}
df=pd.DataFrame(data)
df
missing diff
0   0   NaN
1   1   60.0
2   1   NaN
3   0   NaN
4   0   NaN
5   1   180.0
6   1   NaN
7   0   120.0
if df['diff'][ind]!=np.nan:
    if ind!=0:
        if df['missing'][ind]>df['missing'][ind-1]:
            df=df.drop(ind,0)
        else:
            df=df.drop(ind-1,0)
df
missing diff
0   0   NaN
1   1   60.0
2   1   NaN
3   0   NaN
4   0   NaN
5   1   180.0
6   1   NaN
7   0   120.0

标签: pythonpandas

解决方案


IIUC,你可以试试:

m = df['diff'].notna()
df = (
    pd.concat([
        df[df['diff'].isna()],
        df[m][df[m.shift(-1).fillna(False)]['missing'].values >
              df[m]['missing'].values]
    ])
)

输出:

  missing  diff
1       0  <NA>
3       1  <NA>
4       0  <NA>
5       0  <NA>
7       1  <NA>
8       0   120

推荐阅读