首页 > 解决方案 > 从数据框中删除相似的字符串重复项

问题描述

我有 df 目前看起来像这样:

Car Name      Number
Adam Leaf     9
Adamm Leaf    9
Adam Lea      NaN
Adam-Leaf     NaN
Adam/Leaf     9
Claire-Green  NaN
Cliare Green  3
Claire Green  3
Claire Gren   NaN
Claire/Green  3

我正在尝试删除变化以实现这样的目标

Car Name      Number
Adam Leaf     9
Claire Green  3

标签: pythonpandasdataframedata-cleaning

解决方案


这是一种方法jellyfish

import jellyfish

s=df.groupby(df['Car Name'].apply(jellyfish.soundex)).first()
              Car Name  Number
Car Name                      
A354         Adam Leaf     9.0
C462      Claire-Green     3.0

推荐阅读