首页 > 解决方案 > 检查列值是否在熊猫的其他列中,忽略特殊字符和字符大小

问题描述

这个问题有一个轻微的变体检查列值是否在熊猫的其他列中

我有一个名为 test 的数据框

name_0        name_1    overall_name
Asda          Nan       Tesco
Asda          Nan       ASDA
LIDL 1        Asda      Lidl
AAA           Asda      ASDA
AAA           Asda      ASDA
Sainsbury     Nan       Lidl

如何检查是否test.overall_name在任何其他列中['name_0', 'name_1' etc]忽略字符(小写/大写)和任何特殊字符的大小。

所以我理想的数据框应该是这样的:

name_0        name_1    overall_name   namematch 
Asda          Nan       Tesco          no match 
Asda          Nan       ASDA           match
LIDL 1        Asda      Lidl           match
AAA           Asda      ASDA           match
AAA           Asda      ASDA           match
Sainsbury     Nan       Lidl           no match

标签: pythonpandas

解决方案


看一下这个:

此方法转换并比较值:

import pandas as pd 
import re

def match (first, second, overall):
    f = re.sub(r"[^a-zA-Z]"," ", first.lower()).strip()
    s = re.sub(r"[^a-zA-Z]"," ", second.lower()).strip()
    o = re.sub(r"[^a-zA-Z]"," ", overal.lower()).strip()
    if f == o:
        return 1
    elif s == o:
        return 1
    else:
        return 0

这行代码添加了匹配列并将函数应用于每一行:

df['match'] = df.apply(lambda x: match(x['name_0'],x['name_1'],x['overall_name']),axis=1)

结果是这样的:

    name_0  name_1  overall_name    match
  0 Asda    Nan     Tesco             0
  1 Asda    Nan     ASDA              1
  2 LIDL 1  Asda    Lidl              1
  3 AAA     Asda    ASDA              1
  4 AAA     Asda    ASDA              1
  5 Sainsbury   Nan Lidl              0

请让我知道这对你有没有用。


推荐阅读