python - 如何使用 Python 清理填充有名称的数据框列?
问题描述
我有以下数据框:
df = pd.DataFrame( columns = ['Name'])
df['Name'] = ['Aadam','adam','AdAm','adammm','Adam.','Bethh','beth.','beht','Beeth','Beth']
我想清理列以实现以下目标:
df['Name Corrected'] = ['adam','adam','adam','adam','adam','beth','beth','beth','beth','beth']
df
清理后的名称基于以下参考表:
ref = pd.DataFrame( columns = ['Cleaned Names'])
ref['Cleaned Names'] = ['adam','beth']
我知道模糊匹配,但我不确定这是否是解决问题的最有效方法。
解决方案
你可以试试:
lst=['adam','beth']
out=pd.concat([df['Name'].str.contains(x,case=False).map({True:x}) for x in lst],axis=1)
df['Name corrected']=out.bfill(axis=1).iloc[:,0]
#Finally:
df['Name corrected']=df['Name corrected'].ffill()
#but In certain condition ffill() gives you wrong values
解释:
lst=['adam','beth']
#created a list of words
out=pd.concat([df['Name'].str.contains(x,case=False).map({True:x}) for x in lst],axis=1)
#checking If the 'Name' column contain the word one at a time that are inside the list and that will give a boolean series of True and False and then we are mapping The value of that particular element that is inside list so True becomes that value and False become NaN and then we are concatinating both list of Series on axis=1 so that It becomes a Dataframe
df['Name corrected']=out.bfill(axis=1).iloc[:,0]
#Backword filling values on axis=1 and getting the 1st column
#Finally:
df['Name corrected']=df['Name corrected'].ffill()
#Forward filling the missing values
推荐阅读
- oaf - 12.1.3. 甲骨文 OAF。从主从表中获取选定的行
- spring-boot - Mybatis注解如果值为null如何返回hashmap键项
- javascript - 将表格拉伸到全宽 jsPDF Autotable
- java - 使用 Java 和 Cron 进行 GAE 数据存储备份
- python - 时间戳索引:整个数据帧上的 get_loc
- android - 在 viewpager 内的片段上调用 onCreateView()
- azure-devops - vsts 任务输入验证
- java - 如何在 JavaFX 中添加两个图像,一个在另一个之上?
- python-3.x - 如何在 python 中将字典列表转换为简单的二维列表以执行特定任务?
- python - 我想使用python“循环”作为下拉列表