首页 > 解决方案 > 在一组指定的行中将数据从行移动到另一行

问题描述

我有这个要转换的数据集,所以我只选择了它的外观。所以我们有一个名为 Hospital 的列,其中有 4 行重复直到数据帧结束。我想进行转换,以便所有数据只能保存在称为 prelim_arm_1 的第一行并删除其余 3 行臂。

import pandas as pd
import numpy as np

# initialize data of lists. 
data = {'Hospital':['prelim_arm_1' , '24_hour_review_arm_1','48_hour_review_arm_1',
                    '72_hour_review_arm_1','discharge_informat_arm_1','prelim_arm_1' , 
                    '24_hour_review_arm_1','48_hour_review_arm_1',
                    '72_hour_review_arm_1','discharge_informat_arm_1'],
        'Bug_Hosp':['133', 'NAN' , 'NAN', 'NAN', 'NAN','133', 'NAN' , 'NAN', 'NAN', 'NAN'], 
        'code':['G45','NAN' ,'NAN','NAN', 'NAN', 'G45','NAN' ,'NAN','NAN', 'NAN'],
        'cont':['T256','NAN' ,'NAN','NAN', 'NAN','T256','NAN' ,'NAN','NAN', 'NAN'],
        'IPC':['NAN','NAN' ,'NAN','567TY', 'NAN','NAN','NAN' ,'NAN','567Tu', 'NAN'],
        'NO_CT':['NAN','NAN' ,'NAN','NAN', '5667','NAN','NAN' ,'NAN','3456', 'NAN'],
        } 

# Create DataFrame 
df_final = pd.DataFrame(data) 

# Print the output. 
print(df_final)


最终数据集应如下所示

import pandas as pd
import numpy as np

# initialize data of lists. 
data = {'Hospital':['prelim_arm_1'],
        'Bug_Hosp':['133'], 'code':['G45'],
        'cont':['T256'],
        'IPC':['567TY'],
        'NO_CT':['5667']} 

# Create DataFrame 
df_final = pd.DataFrame(data) 

# Print the output. 
print(df_final)

数据集很大,有重复的行臂,但我想要每组 4 行,它应该只保存prelim_arm_1上的数据并删除其他3 行臂。所以最终表将只有prelim_arm_1,每组 4 个手臂的数据。

标签: pythonpandasnumpydataframe

解决方案


如果希望每 5 行的第一个非缺失值首先使用DataFrame.replaceifNAN是字符串,否则省略此步骤,然后使用groupbywith GroupBy.firstby helper Series 创建比较Hospital列的第一个值prelim_arm_1Series.cumsum

#if necessary
df_final = df_final.replace('NAN',np.nan)

df_final = df_final.groupby(df_final['Hospital'].eq('prelim_arm_1').cumsum()).first()
print(df_final)
              Hospital Bug_Hosp code  cont    IPC NO_CT
Hospital                                               
1         prelim_arm_1      133  G45  T256  567TY  5667
2         prelim_arm_1      133  G45  T256  567Tu  3456

详情

print(df_final['Hospital'].eq('prelim_arm_1').cumsum())
0    1
1    1
2    1
3    1
4    1
5    2
6    2
7    2
8    2
9    2
Name: Hospital, dtype: int32

推荐阅读