首页 > 解决方案 > R删除一列中的重复值

问题描述

我有以下形式的数据:

column1                    column2
milk,cheese,eggs         milk,cheese,sugar 
cheese, eggs             milk,water
eggs, milk               milk, water, juice

我想删除 column2 中所有重复牛奶的实例。也就是说,如果连续两列中都存在牛奶,则从列 2 中删除牛奶。因此,理想情况下,输出应如下所示:

column1                    column2
milk,cheese,eggs         cheese,sugar 
cheese, eggs             milk,water
eggs, milk               water, juice

标签: rdataframe

解决方案


我们可以使用grepl来识别是否'milk'存在于并从使用中column1删除它。column2gsub

inds <- grepl('milk', df$column1)
df$column2[inds] <- gsub('milk', '', df$column2[inds])
#Remove additional commas in the text if present
df$column2 <- gsub(',{2,}', '', trimws(df$column2, whitespace = ','))
df

#           column1      column2
#1 milk,cheese,eggs cheese,sugar
#2      cheese,eggs   milk,water
#3        eggs,milk  water,juice

数据

df <- structure(list(column1 = c("milk,cheese,eggs", "cheese,eggs", 
"eggs,milk"), column2 = c("milk,cheese,sugar", "milk,water", 
"milk,water,juice")), class = "data.frame", row.names = c(NA, -3L))

推荐阅读