r - R删除一列中的重复值
问题描述
我有以下形式的数据:
column1 column2
milk,cheese,eggs milk,cheese,sugar
cheese, eggs milk,water
eggs, milk milk, water, juice
我想删除 column2 中所有重复牛奶的实例。也就是说,如果连续两列中都存在牛奶,则从列 2 中删除牛奶。因此,理想情况下,输出应如下所示:
column1 column2
milk,cheese,eggs cheese,sugar
cheese, eggs milk,water
eggs, milk water, juice
解决方案
我们可以使用grepl
来识别是否'milk'
存在于并从使用中column1
删除它。column2
gsub
inds <- grepl('milk', df$column1)
df$column2[inds] <- gsub('milk', '', df$column2[inds])
#Remove additional commas in the text if present
df$column2 <- gsub(',{2,}', '', trimws(df$column2, whitespace = ','))
df
# column1 column2
#1 milk,cheese,eggs cheese,sugar
#2 cheese,eggs milk,water
#3 eggs,milk water,juice
数据
df <- structure(list(column1 = c("milk,cheese,eggs", "cheese,eggs",
"eggs,milk"), column2 = c("milk,cheese,sugar", "milk,water",
"milk,water,juice")), class = "data.frame", row.names = c(NA, -3L))