首页 > 解决方案 > Python数据框 - 基于列删除连续行

问题描述

我需要根据列值删除连续的行。我的数据框如下所示

df = pd.DataFrame({
            "CustID":
                ["c1","c1","c1","c1","c1","c1","c1","c1","c1","c1","c2","c2","c2","c2","c2","c2"],
            "saleValue":
                [10, 12, 13, 6, 4 , 2, 11, 17, 1,5,8,2,16,13,1,4],
             "Status":
                [0, 0, 0, 1, 1 ,1, 0, 0, 1,1,1,1,0,0,1,1]
            
            
    })

dataframe looks like below

  CustID    saleValue   Status
    c1            10    0
    c1            12    0
    c1            13    0
    c1             6    1
    c1             4    1
    c1             2    1
    c1            11    0
    c1            17    0
    c1             1    1
    c1             5    1
    c2             8    1
    c2             2    1
    c2            16    0
    c2            13    0
    c2             1    1
    c2             4    1
    

仅当状态为 1 时,我才需要删除每个 CustID 的连续行。请告诉我最好的方法

so the output should look like below.
 

CustID  saleValue   Status
    c1        10          0
    c1        12          0
    c1        13          0
    c1         6          1
    c1        11          0
    c1        17          0
    c1         1          1
    c2         8          1
    c2        16          0
    c2        13          0
    c2         1          1

标签: pythonpandasdataframe

解决方案


为整个 DataFrame 创建一个布尔掩码。

给定 DataFrame 已经按 ID 分组,查找值为 1 的行,前一行也为 1,并且 ID 与上一行的 ID 相同。这些是要删除的行,所以保留其余的行。

to_drop = (df['Status'].eq(1) & df['Status'].shift().eq(1)  # Consecutive 1s
           & df['CustID'].eq(df['CustID'].shift()))         # Within same ID  

df[~to_drop]

   CustID  saleValue  Status
0      c1         10       0
1      c1         12       0
2      c1         13       0
3      c1          6       1
6      c1         11       0
7      c1         17       0
8      c1          1       1
10     c2          8       1
12     c2         16       0
13     c2         13       0
14     c2          1       1

推荐阅读