首页 > 解决方案 > 根据同一列的先前值更改python数据框中的列

问题描述

我在 pandas python 中有一个数据框作为以下 数据框

<table style="width:100%">
  <tr>
    <th>ID</th>
    <th>AGE</th> 
    <th>GENDER</th>
    <th>TIME</th>
    <th>CODE</th>
  </tr>
  <tr>
    <td>1</td>
    <td>66</td> 
    <td>M</td>
    <td>1</td>
    <td>1</td>
  </tr>
  <tr>
    <td>1</td>
    <td>66</td> 
    <td>M</td>
    <td>2</td>
    <td>1</td>
  </tr>
  <tr>
    <td>1</td>
    <td>66</td> 
    <td>M</td>
    <td>3</td>
    <td>1</td>
  </tr>
  <tr>
    <td>1</td>
    <td>66</td> 
    <td>M</td>
    <td>4</td>
    <td>1</td>
  </tr>
    <tr>
    <td>2</td>
    <td>20</td> 
    <td>F</td>
    <td>1</td>
    <td>0</td>
  </tr>
  <tr>
    <td>2</td>
    <td>20</td> 
    <td>F</td>
    <td>2</td>
    <td>0</td>
    <tr>
    <td>2</td>
    <td>20</td> 
    <td>F</td>
    <td>3</td>
    <td>0</td> 
  </tr>
  <tr>
    <td>3</td>
    <td>18</td> 
    <td>F</td>
    <td>1</td>
    <td>1</td>
  </tr>
    <tr>
    <td>3</td>
    <td>18</td> 
    <td>F</td>
    <td>2</td>
    <td>1</td>
  </tr>
  <tr>
    <td>3</td>
    <td>18</td> 
    <td>F</td>
    <td>3</td>
    <td>1</td>
  </tr>
  <tr>
    <td>3</td>
    <td>18</td> 
    <td>F</td>
    <td>4</td>
    <td>1</td>
  </tr>
  <tr>
    <td>3</td>
    <td>18</td> 
    <td>F</td>
    <td>5</td>
    <td>1</td>
  </tr>
</table>

1 66 M 1 1 1 66 M 2 1 1 66 M 3 1 2 20 F 1 0 2 20 F 2 0 2 20 F 3 0 2 20 F 4 0 3 18 F 1 1 3 18 F 2 1 3 18 F 3 1 3 18 女 4 1

我需要根据以下内容更改最后一列(只要“CODE”列为 1,则将该 ID 的最后一行保持为 1,并将前几行更改为零)

<table style="width:100%">
  <tr>
    <th>ID</th>
    <th>AGE</th> 
    <th>GENDER</th>
    <th>TIME</th>
    <th>CODE</th>
  </tr>
  <tr>
    <td>1</td>
    <td>66</td> 
    <td>M</td>
    <td>1</td>
    <td>0</td>
  </tr>
  <tr>
    <td>1</td>
    <td>66</td> 
    <td>M</td>
    <td>2</td>
    <td>0</td>
  </tr>
  <tr>
    <td>1</td>
    <td>66</td> 
    <td>M</td>
    <td>3</td>
    <td>0</td>
  </tr>
  <tr>
    <td>1</td>
    <td>66</td> 
    <td>M</td>
    <td>4</td>
    <td>1</td>
  </tr>
    <tr>
    <td>2</td>
    <td>20</td> 
    <td>F</td>
    <td>1</td>
    <td>0</td>
  </tr>
  <tr>
    <td>2</td>
    <td>20</td> 
    <td>F</td>
    <td>2</td>
    <td>0</td>
    <tr>
    <td>2</td>
    <td>20</td> 
    <td>F</td>
    <td>3</td>
    <td>0</td> 
  </tr>
  <tr>
    <td>3</td>
    <td>18</td> 
    <td>F</td>
    <td>1</td>
    <td>0</td>
  </tr>
    <tr>
    <td>3</td>
    <td>18</td> 
    <td>F</td>
    <td>2</td>
    <td>0</td>
  </tr>
  <tr>
    <td>3</td>
    <td>18</td> 
    <td>F</td>
    <td>3</td>
    <td>0</td>
  </tr>
  <tr>
    <td>3</td>
    <td>18</td> 
    <td>F</td>
    <td>4</td>
    <td>0</td>
  </tr>
  <tr>
    <td>3</td>
    <td>18</td> 
    <td>F</td>
    <td>5</td>
    <td>1</td>
  </tr>
</table>

怎么可能使用熊猫来做到这一点?

查找后我发现这行代码删除了每个组的最后一行 dfnew = (df.groupby('ID').apply(lambda x: x.iloc[:-1] if len(x)>1否则 x))

提前致谢

标签: pythonpandasdataframe

解决方案


通过过滤获取索引并通过 by删除1欺骗:IDdrop_duplicates

i = df[df['CODE'] == 1].drop_duplicates(subset=['ID'], keep='last').index

将 column 设置为0first 然后替换为i

df['CODE'] = 0
df.loc[i, 'CODE'] = 1

另一种解决方案是创建布尔掩码并将其转换为ints:

m = (df['CODE'] == 1) & ~df['ID'].duplicated(keep='last')
print (m)
0     False
1     False
2      True
3     False
4     False
5     False
6     False
7     False
8     False
9     False
10     True
dtype: bool

df['CODE'] = m.astype(int)

print (df)
    ID  AGE GENDER  TIME  CODE
0    1   66      M     1     0
1    1   66      M     2     0
2    1   66      M     3     1
3    2   20      F     1     0
4    2   20      F     2     0
5    2   20      F     3     0
6    2   20      F     4     0
7    3   18      F     1     0
8    3   18      F     2     0
9    3   18      F     3     0
10   3   18      F     4     1

推荐阅读