python - 当两列中有重复的单元格时,如何更改一列中单元格的值
问题描述
我有一个pandas
由列的地址字段组成的数据框。我的问题是,在两列中,我在行中有重复的单元格值。有谁知道当在两列中发现重复时,我如何有条件地更改一列的值?理想情况下,我想保留一个值,并将另一个设置为np.nan
.
这是一个测试用例:
import pandas as pd
test = pd.read_json('{"housename":{"16":null,"17":null,"18":null},"name":{"16":"Shoecare","17":"33","18":"33A"},"house_number":{"16":"32","17":"33","18":"33A"},"street":{"16":"Carfax","17":"Carfax","18":"Carfax"},"city":{"16":"Horsham","17":"Horsham","18":"Horsham"},"postcode":{"16":"RH12 1EE","17":"RH12 1EE","18":"RH12 1EE"}}')
city house_number housename name postcode street
16 Horsham 32 NaN Shoecare RH12 1EE Carfax
17 Horsham 33 NaN 33 RH12 1EE Carfax
18 Horsham 33A NaN 33A RH12 1EE Carfax
在测试用例中,我玩过test.duplicated(subset=['house_number', 'name'])
,但它不会识别house_number
andname
列中的重复值。
有人对如何首先识别两列中的重复单元格,然后将一个值设置为有任何建议np.nan
吗?
期望的输出:
housename name house_number street city postcode
16 NaN Shoecare 32 Carfax Horsham RH12 1EE
17 NaN NaN 33 Carfax Horsham RH12 1EE
18 NaN NaN 33A Carfax Horsham RH12 1EE
解决方案
如果 2 列是house_number
and name
,您可以这样做:
test['name'] = np.where((test['house_number'] == test['name']), np.nan, test['name'])
输出:
city house_number housename name postcode street
16 Horsham 32 NaN Shoecare RH12 1EE Carfax
17 Horsham 33 NaN NaN RH12 1EE Carfax
18 Horsham 33A NaN NaN RH12 1EE Carfax
推荐阅读
- android - 在后按按钮上重置卡片视图的内容
- javascript - node.js 中的承诺
- ssl-certificate - 我们如何使用 ELB 验证 LetsEncrypt 证书?
- c - 从指针数组返回字符串的函数
- datepicker - 如何在 SP.UI.ModalDialog.showModalDialog() 中使用 SharePoint 日期选择器?
- c# - 模型以 .CSHTML 视图中的文本显示
- javascript - 如何处理响应正文是(无内容)
- node.js - 将 socket.io 套接字传递给分叉节点子进程
- apache-kafka - Kafka 消费者应该在生产者之前启动吗?
- css - 如何限制引导容器的高度