python - 根据 pandas 中另一列的值对列执行操作
问题描述
我有一个数据框
df = pd.DataFrame([["A",1,98,88,"",567,453,545,656,323,756], ["B",1,99,"","",231,232,234,943,474,345], ["C",1,97,67,23,543,458,456,876,935,876], ["B",1,"",79,84,895,237,678,452,545,453], ["A",1,45,"",58,334,778,234,983,858,657], ["C",1,23,55,"",183,565,953,565,234,234]], columns=["id","date","col1","col2","col3","col1_num","col1_deno","col3_num","col3_deno","col2_num","col2_deno"])
我需要为列名的 _num 和 _deno 分别设置 Nan/blank 值。例如:如果“col1”的特定行为空白,则将“col1_num”和“col1_deno”的值设为 Nan/ blank。对基于"col2"的"col2_num"和"col2_deno " 以及基于"col3"的 "col3_num "和"col3_deno"重复相同的过程。
预期输出:
df_out = pd.DataFrame([["A",1,98,88,"",567,453,"","",323,756], ["B",1,99,"","",231,232,"","","",""], ["C",1,97,67,23,543,458,456,876,935,876], ["B",1,"",79,84,"","",678,452,545,453], ["A",1,45,"",58,334,778,234,983,"",""], ["C",1,23,55,"",183,565,"","",234,234]], columns=["id","date","col1","col2","col3","col1_num","col1_deno","col3_num","col3_deno","col2_num","col2_deno"])
怎么做?
解决方案
让我们尝试使用布尔掩码:
# select the columns
c = pd.Index(['col1', 'col2', 'col3'])
# create boolean mask
m = df[c].eq('').to_numpy()
# mask the values in `_num` and `_deno` like columns
df[c + '_num'] = df[c + '_num'].mask(m, '')
df[c + '_deno'] = df[c + '_deno'].mask(m, '')
>>> df
id date col1 col2 col3 col1_num col1_deno col3_num col3_deno col2_num col2_deno
0 A 1 98 88 567 453 323 756
1 B 1 99 231 232
2 C 1 97 67 23 543 458 456 876 935 876
3 B 1 79 84 678 452 545 453
4 A 1 45 58 334 778 234 983
5 C 1 23 55 183 565 234 234
推荐阅读
- c - 如何用makefile解决我的编译问题
- r - R 下一个素数 包括开始
- android - Android 为什么按钮不适合屏幕
- python - 数据帧的熊猫切片与csv的不同编码不一致
- android - Android NDK ImageDecoder 函数需要构建错误
- r - 在R中的条件后添加填充行
- git - 如何在合并请求上应用 gitlab CI 中的 if else 条件?
- javascript - Javascript 解析 URI(GEO.IPIFY.ORG)
- amazon-web-services - 由于解析 URL 弃用,AWS EB 实例无法部署
- jquery - 将 jQuery 变量添加到输入字段的占位符中