首页 > 解决方案 > 查找两列比较之间的唯一字符

问题描述

我想比较 column1 和 column2 并获取导致从 column1 检测到差异的唯一值。所以在这种情况下,我应该得到的答案是“Residence - Location”、“-12”、“NAN”和“NA”为空。它将第一列与第二列进行比较

另外,我们可以创建结果并将其存储在另一列中吗?

Result
index   column1         column2                     diff
1.      Admission Date  Residence - Location        Residence - Location
2.      Malnutrition    Malnutrition-12             -12
3.      TB              NAN                         NAN
4.      Anaemia         NA                          NA

代码可以是 R 或 Python。我不介意

def FindDifference(Row):
    x = Row['column1']
    y = Row['column2']

    Difference = ""
    if pd.isnull(y) or y=="nan" or y=="NA":
        return NaN
    if len(x) <= len(y):
        for i in y:
            if i not in x:
                Difference += str(i)
    else:
        for i in x:
            if i not in y:
                Difference += str(i)
    return Difference

ReadDataT = Final_df[['column1','column2']] 
ReadDataT['diff']= ReadDataT.apply(lambda x: FindDifference(x),axis=1)
ReadDataT

这段代码的问题是它比较两者之间的每个字符并给出不仅在两列中的字符的结果......比如第一行给出'RC-Lc'作为差异

标签: pythonrdataframe

解决方案


library(dplyr); library(stringr)
df %>% mutate(diff = str_remove(column2, column1))

  index        column1              column2                 diff
1     1 Admission Date Residence - Location Residence - Location
2     2   Malnutrition      Malnutrition-12                  -12
3     3             TB                  NAN                  NAN
4     4        Anaemia                 <NA>                 <NA>

编辑:相同w/odplyr

df$diff = stringr::str_remove(df$column2, df$column1)


推荐阅读