首页 > 解决方案 > 从数据框中的其他记录中填充 NaN 并过滤唯一记录

问题描述

我有以下df:

df4 = pd.DataFrame(np.array([['cus1234', 'Sara@example.com','Mexico', np.NaN],['cus1234', np.NaN ,'Mexico','348384'],['cus1234', 'Sara@example.com',np.NaN ,'348384'],['cus1234', 'Sara@example.com','Mexico', np.NaN], ['cus930', 'fernandino@example.com','US', np.NaN ], ['cus930', 'fernandino@example.com',np.NaN,'38483483'],['cus930', 'fernandino@example.com',np.NaN ,'38483483'],['cus800033132', 'dumm@example.com','US',np.NaN ],['cus800033132', 'dumm@example.com','US','300309494994']]),

                   columns=['CustomerID', 'email','Country','phone'])

df4:

     CustomerID     email               Country          phone
0   cus1234        Sara@example.com      Mexico           nan
1   cus1234           nan                Mexico         348384
2   cus1234        Sara@example.com         nan         348384
3   cus1234        Sara@example.com      Mexico          nan
4   cus930         fernandino@example.com   US           nan
5   cus930         fernandino@example.com   nan     38483483
6   cus930         fernandino@example.com   nan     38483483
7   cus800033132   dumm@example.com          US         nan
8   cus800033132   dumm@example.com          US     300309494994

CustomerID 重复但客户信息不完整。如果找到每个 CustomerID,我想填充 NaN,然后​​删除重复的drop_duplicates()

到目前为止我做了什么:

df4.groupby(['CustomerID']).fillna(method='ffill')

df4.groupby(['CustomerID'], sort=False).apply(lambda x: x.ffill().bfill())

但没有运气,预期的输出:

CustomerID        email                 Country     phone
cus1234        Sara@example.com         Mexico     348384   
cus930         fernandino@example.com   US         38483483
cus800033132    dumm@example.com        US         300309494994

标签: pythonpandasdataframelambda

解决方案


尝试:

用实际的 NaN 替换字符串 nan:

df4=df4.replace('nan',float('NaN'))

最后:

out=df4.groupby(['CustomerID']).ffill().bfill().drop_duplicates().join(df4['CustomerID'])

输出out

    email                  Country      phone           CustomerID
0   Sara@example.com        Mexico      348384          cus1234
4   fernandino@example.com  US          38483483        cus930
7   dumm@example.com        US          300309494994    cus800033132

推荐阅读