首页 > 解决方案 > 根据 pandas 中另一列的值对列执行操作

问题描述

我有一个数据框

df = pd.DataFrame([["A",1,98,88,"",567,453,545,656,323,756], ["B",1,99,"","",231,232,234,943,474,345], ["C",1,97,67,23,543,458,456,876,935,876], ["B",1,"",79,84,895,237,678,452,545,453], ["A",1,45,"",58,334,778,234,983,858,657], ["C",1,23,55,"",183,565,953,565,234,234]], columns=["id","date","col1","col2","col3","col1_num","col1_deno","col3_num","col3_deno","col2_num","col2_deno"])

我需要为列名的 _num 和 _deno 分别设置 Nan/blank 值。例如:如果“col1”的特定行为空白,则将“col1_num”“col1_deno”的值设为 Nan/ blank。对基于"col2""col2_num""col2_deno " 以及基于"col3"的 "col3_num ""col3_deno"重复相同的过程。

预期输出:

df_out = pd.DataFrame([["A",1,98,88,"",567,453,"","",323,756], ["B",1,99,"","",231,232,"","","",""], ["C",1,97,67,23,543,458,456,876,935,876], ["B",1,"",79,84,"","",678,452,545,453], ["A",1,45,"",58,334,778,234,983,"",""], ["C",1,23,55,"",183,565,"","",234,234]], columns=["id","date","col1","col2","col3","col1_num","col1_deno","col3_num","col3_deno","col2_num","col2_deno"])

怎么做?

标签: pythonpython-3.xpandaspython-2.7dataframe

解决方案


让我们尝试使用布尔掩码

# select the columns
c = pd.Index(['col1', 'col2', 'col3'])

# create boolean mask
m = df[c].eq('').to_numpy()

# mask the values in `_num` and `_deno` like columns
df[c + '_num'] = df[c + '_num'].mask(m, '')
df[c + '_deno'] = df[c + '_deno'].mask(m, '')

>>> df

  id  date col1 col2 col3 col1_num col1_deno col3_num col3_deno col2_num col2_deno
0  A     1   98   88           567       453                         323       756
1  B     1   99                231       232                                      
2  C     1   97   67   23      543       458      456       876      935       876
3  B     1        79   84                         678       452      545       453
4  A     1   45        58      334       778      234       983                   
5  C     1   23   55           183       565                         234       234

推荐阅读