首页 > 解决方案 > python在数据框中移动数据

问题描述

使用以下数据框,我想转移数据。按“ID”和“系列”分组,Q、R 和 T 列中的数据应下移到“状态”为“结束”的行。

    data = pd.DataFrame({
'ID': ['A','A','A','B','B','B','B','C','C','C','C','C','D','D'], 
'Series': [1,1,1,1,1,2,2,1,1,2,2,2,1,1],
'Status': ['Begin','Begin','End','Begin','End','Begin','End','Begin','End','Begin','Begin','End','Begin','End'],
'Q':[9,'','',30,'',14,'',3,'',17,'','',1,''],
'R': ['',8,'','','','','','','','',7,'','',''],
'T': ['','',12,'',38,'',21,'',6,'','',35,'',5]
})

结果应如下所示:

result = pd.DataFrame({
'ID': ['A','A','A','B','B','B','B','C','C','C','C','C','D','D'], 
'Series': [1,1,1,1,1,2,2,1,1,2,2,2,1,1],
'Status': ['Begin','Begin','End','Begin','End','Begin','End','Begin','End','Begin','Begin','End','Begin','End'],
'Q':['','',9,'',30,'',14,'',3,'','',17,'',1],
'R': ['','',8,'','','','','','','','',7,'',''],
'T': ['','',12,'',38,'',21,'',6,'','',35,'',5]
})

标签: pythonpandas

解决方案


使用GroupBy.transform+GroupBy.first查找第一个非NaNs 值,然后通过maskand删除重复值duplicated

cols = ['Q', 'R', 'T']

#repalce emty strings to NaNs
data[cols] = data[cols].astype(str).replace('', np.nan)
print (data)
   ID  Series Status    Q    R    T
0   A       1  Begin    9  NaN  NaN
1   A       1  Begin  NaN    8  NaN
2   A       1    End  NaN  NaN   12
3   B       1  Begin   30  NaN  NaN
4   B       1    End  NaN  NaN   38
5   B       2  Begin   14  NaN  NaN
6   B       2    End  NaN  NaN   21
7   C       1  Begin    3  NaN  NaN
8   C       1    End  NaN  NaN    6
9   C       2  Begin   17  NaN  NaN
10  C       2  Begin  NaN    7  NaN
11  C       2    End  NaN  NaN   35
12  D       1  Begin    1  NaN  NaN
13  D       1    End  NaN  NaN    5

g = data.groupby(['ID', 'Series'])
for c in cols:
    data[c] = g[c].transform('first')
print (data)
   ID  Series Status   Q    R   T
0   A       1  Begin   9    8  12
1   A       1  Begin   9    8  12
2   A       1    End   9    8  12
3   B       1  Begin  30  NaN  38
4   B       1    End  30  NaN  38
5   B       2  Begin  14  NaN  21
6   B       2    End  14  NaN  21
7   C       1  Begin   3  NaN   6
8   C       1    End   3  NaN   6
9   C       2  Begin  17    7  35
10  C       2  Begin  17    7  35
11  C       2    End  17    7  35
12  D       1  Begin   1  NaN   5
13  D       1    End   1  NaN   5

data[cols] = data[cols].mask(data.duplicated(['ID','Series'], keep='last'), '').fillna('')
print (data)

   ID  Series Status   Q  R   T
0   A       1  Begin           
1   A       1  Begin           
2   A       1    End   9  8  12
3   B       1  Begin           
4   B       1    End  30     38
5   B       2  Begin           
6   B       2    End  14     21
7   C       1  Begin           
8   C       1    End   3      6
9   C       2  Begin           
10  C       2  Begin           
11  C       2    End  17  7  35
12  D       1  Begin           
13  D       1    End   1      5

推荐阅读