首页 > 解决方案 > 使用两个条件删除行 - pandas

问题描述

我在下面有一个带有许多重复值的 df。使用下面,我的目标是删除Value与前几行相比唯一且Group等于C.

此外,在发生这种情况的地方,我想删除所有以前的重复行。

d = {'Item': ["Red", "Red", "Red", "Green", "Green", "Red", "Red", "Red", "Green", "Green", "Green", "Green", "Red", "Red", "Red", "Green"],
     'Value': [1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 6],
     'Group': ["A", "B", "B", "C", "D", "D", "A", "B", "C", "D", "E", "E", "B", "B", "D", "D"],
    }

df = pd.DataFrame(data=d)

mask = (df['Item'].isin(['Green'])) & (df.Value.eq(df.Value.shift(-1)))

df = df[~mask]

输入是:

     Item  Value Group
0     Red      1     A
1     Red      1     B
2     Red      1     B
3   Green      2     C
4   Green      2     D
5     Red      3     D
6     Red      3     A
7     Red      3     B
8   Green      4     C
9   Green      4     D
10  Green      4     E
11  Green      4     E
12    Red      5     B
13    Red      5     B
14    Red      5     D
15  Green      6     D

预期输出:

     Item  Value Group
0     Red      1     A
4   Green      2     D
5     Red      3     D
6     Red      3     A
9   Green      4     D
10  Green      4     E
11  Green      4     E
12    Red      5     B
13    Red      5     B
14    Red      5     D
15  Green      6     D

电流输出:

     Item  Value Group
0     Red      1     A
1     Red      1     B
2     Red      1     B
4   Green      2     D
5     Red      3     D
6     Red      3     A
7     Red      3     B
11  Green      4     E
12    Red      5     B
13    Red      5     B
14    Red      5     D
15  Green      6     D

标签: pythonpandas

解决方案


# form the condition
cond = df.Value.diff().ne(0) & df.Group.eq("C")

# also consider the previous row
to_drop = cond | cond.shift(-1)

# index with inverse of the mask
new_df = df[~to_drop]

Value通过查看不为0的差异可以找到不等于前一个的位置:

df.Value.diff().ne(0)

Group存在"C"被发现

df.Group.eq("C")

将它们与“和”结合:

cond = df.Value.diff().ne(0) & df.Group.eq("C")

由于您也想删除前一行,因此我们可以“或”使用此移位版本:

to_drop = cond | cond.shift(-1)

这使

>>> to_drop

0     False
1      True
2      True
3     False
4     False
5     False
6     False
7     False
8     False
9     False
10    False
11    False
12    False
13    False
14    False

因此最终操作是与此相反的索引:

>>> new_df = df[~to_drop]
>>> new_df

     Item  Value Group
0     Red      1     A
3   Green      2     D
4     Red      3     E
5     Red      3     A
6     Red      3     B
7   Green      4     B
8   Green      4     D
9   Green      4     E
10  Green      4     A
11    Red      5     B
12    Red      5     C
13    Red      5     D
14  Green      6     E

推荐阅读