python - 删除 pandas 列中的重复值,但忽略一个值
问题描述
我确信对此有一个优雅的解决方案,但我找不到。在熊猫数据框中,如何在忽略一个值的同时删除列中的所有重复值?
repost_of_post_id title
0 7139471603 Man with an RV needs a place to park for a week
1 6688293563 Land for lease
2 None 2B/1.5B, Dishwasher, In Lancaster
3 None Looking For Convenience? Check Out Cordova Par...
4 None 2/bd 2/ba, Three Sparkling Swimming Pools, Sit...
5 None 1 bedroom w/Closet is bathrooms in Select Unit...
6 None Controlled Access/Gated, Availability 24 Hours...
7 None Beautiful 3 Bdrm 2 & 1/2 Bth Home For Rent
8 7143099582 Need Help Getting Approved?
9 None *MOVE IN READY APT* REQUEST TOUR TODAY!
我想要的是将所有None
值保留在 中repost_of_post_id
,但省略数值的任何重复项,例如,如果数据框中有 的重复项7139471603
。
[更新]我使用这个脚本得到了想要的结果,但如果可能的话,我想在一个单行中完成这个。
# remove duplicate repost id if present (i.e. don't remove rows where repost_of_post_id value is "None")
# ca_housing is the original dataframe that needs to be cleaned
ca_housing_repost_none = ca_housing.loc[ca_housing['repost_of_post_id'] == "None"]
ca_housing_repost_not_none = ca_housing.loc[ca_housing['repost_of_post_id'] != "None"]
ca_housing_repost_not_none_unique = ca_housing_repost_not_none.drop_duplicates(subset="repost_of_post_id")
ca_housing_unique = ca_housing_repost_none.append(ca_housing_repost_not_none_unique)
解决方案
您可以尝试删除 None 值,然后检测重复项,然后将它们从原始列表中过滤掉。
In [1]: import pandas as pd
...: from string import ascii_lowercase
...:
...: ids = [1,2,3,None,None, None, 2,3, None, None,4,5]
...: df = pd.DataFrame({'id': ids, 'title': list(ascii_lowercase[:len(ids)])})
...: print(df)
...:
...: print(df[~df.index.isin(df.id.dropna().duplicated().loc[lambda x: x].index)])
id title
0 1.0 a
1 2.0 b
2 3.0 c
3 NaN d
4 NaN e
5 NaN f
6 2.0 g
7 3.0 h
8 NaN i
9 NaN j
10 4.0 k
11 5.0 l
id title
0 1.0 a
1 2.0 b
2 3.0 c
3 NaN d
4 NaN e
5 NaN f
8 NaN i
9 NaN j
10 4.0 k
11 5.0 l
推荐阅读
- python - geopandas 和 sf 怎么可能为同一个文件提供不同的汇总统计输出?
- excel - 在 Excel 中查找、查找、匹配和替换单元格,需要帮助或建议
- python - 如何使用 python 套接字为传入请求发送 HTTP 响应
- reactjs - 使用 React.memo、useCallback、useMemo 防止对象重新渲染
- sql-server - 定义执行代码后,我可以为变量或参数赋值吗?
- javascript - 如何使用 Azuru DevOps 在 React 应用程序中处理多个环境
- dataframe - 如何从我的数据集中删除特定字符串
- r - R,计算数据框中的非缺失日期,将计数作为列返回
- javascript - React Native 灰色标签上的 Internet 丢失
- php - Typo3 版本 11.2.0 后端和前端无法通过单击“切换到...”来访问 - 安装工具内的按钮