首页 > 解决方案 > 仅当特定列中出现子字符串(或符号)时才将值复制到另一列,否则将另一列保持不变 DataFrame

问题描述

我有数据框:

cost      total     
null      $519
null      $78
xx24
($1500)   
          $51
0.00    
($924)
$33
          $78

期望:

cost      total     
null      $519
null      $78
xx24
($1500)   $1500
          $51
0.00    
($924)    $924
$33       $33
          $78

我尝试定义方法并使用apply()但这也会替换'total'中已经存在的值。我可以将“真/假”值放入新列,但这似乎不是正确的方法。

标签: pythonpandas

解决方案


您可以提取之间的值,()但仅适用于由in$选择的行:Series.str.containsSeries.mask

mask = df['cost'].str.contains('$', na=False, regex=False)

df['total'] = df['total'].mask(mask, df['cost'].str.extract(r"\((.*?)\)" , expand=False))

#another solution from copy and strip () 
#df['total'] = df['total'].mask(mask, df['cost'].str.strip('()'))
print (df)
      cost  total
0      NaN   $519
1      NaN    $78
2     xx24    NaN
3  ($1500)  $1500
4      NaN    $51
5     0.00    NaN
6   ($924)   $924
7      NaN    $78

或者如果可能的话,total用从使用中提取的值替换缺失值()

df['total'] = df['total'].fillna(df['cost'].str.extract(r"\((.*?)\)" , expand=False))
print (df)
      cost  total
0      NaN   $519
1      NaN    $78
2     xx24    NaN
3  ($1500)  $1500
4      NaN    $51
5     0.00    NaN
6   ($924)   $924
7      NaN    $78

推荐阅读