首页 > 解决方案 > Erase repeated lines in r with a conditional column

问题描述

I have this data: https://pastebin.com/x8HrT8qK

"sp";"mes";"ano";"code"
"56";"CM";7;2016;"CM52"
"57";"CM";2;2019;"CM52"
"58";"CM";11;2016;"CM53"
"59";"CM";9;2019;"CM53"
"60";"CM";5;2018;"CM53"
"61";"CM";5;2018;"CM53"
"374";"EI";8;2019;"EI26"
"375";"EI";8;2019;"EI26"
"376";"EI";3;2019;"EI26"
"377";"EI";7;2019;"EI26"
"378";"EI";11;2019;"EI26"
"379";"EI";2;2020;"EI26"
"380";"EI";10;2019;"EI27"
"381";"EI";11;2019;"EI27"
"382";"EI";11;2019;"EI27"

and I would like to exclude lines that have the same "code" only if they have the same "ano"

So that the data would look like this: https://pastebin.com/F7tkUZE1

"sp";"mes";"ano";"code"
"56";"CM";7;2016;"CM52"
"57";"CM";2;2019;"CM52"
"58";"CM";11;2016;"CM53"
"59";"CM";9;2019;"CM53"
"60";"CM";5;2018;"CM53"
"374";"EI";8;2019;"EI26"
"379";"EI";2;2020;"EI26"
"380";"EI";10;2019;"EI27"

标签: r

解决方案


We can use duplicated to create a logical index of duplicate elements and then remove the rows based on the index. Using base R (without any external packages)

df1[ !duplicated(df1[c('ano', 'code')]),]
#    sp mes  ano code
#56  CM   7 2016 CM52
#57  CM   2 2019 CM52
#58  CM  11 2016 CM53
#59  CM   9 2019 CM53
#60  CM   5 2018 CM53
#374 EI   8 2019 EI26
#379 EI   2 2020 EI26
#380 EI  10 2019 EI27

data

df1 <- read.csv('file.csv', sep=";")   


df1 <- structure(list(sp = c("CM", "CM", "CM", "CM", "CM", "CM", "EI", 
"EI", "EI", "EI", "EI", "EI", "EI", "EI", "EI"), mes = c(7L, 
2L, 11L, 9L, 5L, 5L, 8L, 8L, 3L, 7L, 11L, 2L, 10L, 11L, 11L), 
    ano = c(2016L, 2019L, 2016L, 2019L, 2018L, 2018L, 2019L, 
    2019L, 2019L, 2019L, 2019L, 2020L, 2019L, 2019L, 2019L), 
    code = c("CM52", "CM52", "CM53", "CM53", "CM53", "CM53", 
    "EI26", "EI26", "EI26", "EI26", "EI26", "EI26", "EI27", "EI27", 
    "EI27")), class = "data.frame", row.names = c("56", "57", 
"58", "59", "60", "61", "374", "375", "376", "377", "378", "379", 
"380", "381", "382"))

推荐阅读