首页 > 解决方案 > 如何使用有时包含 np.nan 的其他列的字符串填充 df 列,遍历 elifs 以返回适当的组合?

问题描述

df 数据不完善:

df = pd.DataFrame({'A Surname' : ['Smith', 'Longshore', 'Jones'], 
                       'A Title': ['Mr', 'Miss', np.nan],
                       'B Surname' : ['Smith', np.nan, 'Nguyen'], 
                       'B Title': ['Mrs', np.nan, np.nan]})

我正在寻找一个包含适合于尽可能同时解决 A 和 B 的字符串的列。如果有np.nan,则组合字段返回np.nan,并且它需要是合乎逻辑的(例如,如果'B Surname'是np.nan,则不要使用'B Title'),所以我需要一系列规则来确定最合适的组合。我不成功的方法:

def combined(x):
    full = df['A Title'] + ' ' & df['A Surname'] & ' & ' & df['B Title'] & ' ' & df['B Surname']
    no_title = df['A Surname'] & ' & ' & df['B Surname']
    # more combinations
    if full != np.nan:
        return full
    elif no_title != np.nan:
        return no_title
    # more elifs
    else:
        return df['A Surname']
        
df['combined string'] = np.nan
df['combined string'] = df['combined string'].apply(combined)

所需的输出如下所示:

desired_df = pd.DataFrame({'A Surname' : ['Smith', 'Longshore', 'Jones'], 
                       'A Title': ['Mr', 'Miss', 'Mr'],
                       'B Surname' : ['Smith', np.nan, 'Whatever'], 
                       'B Title': ['Mrs', np.nan, np.nan],
                       'combined string': ['Mr Smith & Mrs Smith', 'Miss Longshore', 'Jones & Whatever']})

这样做的实用方法是什么?

标签: pythonpandas

解决方案


Series.str.cat在这里使用Series.str.strip

a = df['A Title'].str.cat(df['A Surname'], sep=' ', na_rep='').str.strip()
b = df['B Title'].str.cat(df['B Surname'], sep=' ', na_rep='').str.strip()
df['combined string'] = a.str.cat(b, sep=' & ').str.strip(' &')
print (df)
   A Surname A Title B Surname B Title       combined string
0      Smith      Mr     Smith     Mrs  Mr Smith & Mrs Smith
1  Longshore    Miss       NaN     NaN        Miss Longshore
2      Jones     NaN    Nguyen     NaN        Jones & Nguyen

推荐阅读