首页 > 解决方案 > R per ID column 按值列表删除行

问题描述

我有一个每个 ID 列有多行的数据框,但我只想保留一行。不幸的是,我不能使用我的真实数据框,所以我将在下面创建一个类似的假设数据框。

+----+--------+--------------------+--+
| ID | name   | org                |  |
+----+--------+--------------------+--+
| 1  | Apple  | Apple              |  |
+----+--------+--------------------+--+
| 1  | Apple  | Sour               |  |
+----+--------+--------------------+--+
| 1  | Apple  | Goldstar           |  |
+----+--------+--------------------+--+
| 2  | Banana | Banana             |  |
+----+--------+--------------------+--+
| 2  | Banana | banana             |  |
+----+--------+--------------------+--+
| 3  | Yogi   | yogi               |  |
+----+--------+--------------------+--+
| 3  | yogi   | strawberry yoghurt |  |
+----+--------+--------------------+--+

我正在寻找一种方法来删除所有行,除了它在可能值列表中找到的第一行之外,如果没有匹配,则保留所有行。

在这种假设的情况下,我想给函数一个值列表,例如:

appleNamesTokeep <- c("Goldstar", "Apple", "Sour")
bananaNamesTokeep <- c("Banana", "banana") #Capital sensitive
yoghurtNamesTokeep <- c("strawberry yoghurt", "yogi")

结果将是

+----+--------+--------------------+--+
| ID | name   | org                |  |
+----+--------+--------------------+--+
| 1  | Apple  | Goldstar           |  |
+----+--------+--------------------+--+
| 2  | Banana | Banana             |  |
+----+--------+--------------------+--+
| 3  | yogi   | strawberry yoghurt |  |
+----+--------+--------------------+--+

如果找到名称列值为“Goldstar”的行,则应删除所有其他行,如果未找到 goldstar 但确实有“Apple”,则应保留该行并删除其他所有内容,依此类推。它应该按 ID 和每个列表查看,因为每一行可能是关于一个完全不同的主题(在这种情况下是不同类型的食物)。

标签: rdataframefiltering

解决方案


基于 R 的可能解决方案:

i1 <- match(tolower(substr(mydf$name,1,3)), substr(NamesTokeep$ind,1,3))
i2 <- match(mydf$org, NamesTokeep$values)

现在使用:

mydf[which(i1 & i2),]

给你:

  ID   name                org
3  1  Apple           Goldstar
4  2 Banana             Banana
7  3   yogi strawberry yoghurt

使用数据:

mydf <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 3L, 3L),
                       name = c("Apple", "Apple", "Apple", "Banana", "Banana", "Yogi", "yogi"),
                       org = c("Apple", "Sour", "Goldstar", "Banana", "banana", "yogi", "strawberry yoghurt")),
                  .Names = c("ID", "name", "org"), class = "data.frame", row.names = c(NA, -7L))

NamesTokeep <- stack(list(apple = c("Goldstar", "Apple", "Sour"),
                          banana = c("Banana", "banana"),
                          yoghurt = c("strawberry yoghurt", "yogi")))[2:1]
NamesTokeep <- aggregate(values ~ ind, data = NamesTokeep, '[', 1)

推荐阅读