r - 如何通过另一个 DF 中的计数删除一个 DF 中的行?
问题描述
这是我的两个数据框:
structure(list(Author = c("Bubb, D. H., et al.", "Bubb, D. H., et al.",
"Bubb, D. H., et al.", "Bubb, D. H., et al.", "Bubb, D. H., et al.",
"Bubb, D. H., et al.", "Bubb, D. H., et al.", "Bubb, D. H., et al.",
"Bubb, D. H., et al.", "Bubb, D. H., et al.", "Robinson et al.",
"Robinson et al.", "Robinson et al.", "Robinson et al.", "Louca et al.",
"Aquiloni, L., et al.", "Aquiloni, L., et al.", "Barbaresi, S., et al.",
"Barbaresi, S., et al.", "Barbaresi, S., et al.", "Gherardi, F., et al.",
"Gherardi, F., et al.", "Gherardi, F., et al.", "Loughman et al.",
"Loughman et al.", "Hall et al.", "Holsman et al. ", "Holsman et al. ",
"Smith B.D et al.", "Smith B.D et al."), Year = c(2006L, 2006L,
2006L, 2002L, 2002L, 2002L, 2002L, 2004L, 2004L, 2004L, 2000L,
2000L, 2000L, 2000L, 2014L, 2005L, 2005L, 2004L, 2004L, 2004L,
2002L, 2002L, 2002L, 2013L, 2013L, 1991L, 2006L, 2006L, 1991L,
1991L), Purpose = c("Invasive/Endangered Species", "Movement Metrics",
"Movement Metrics", "Invasive/Endangered Species", "Movement Metrics",
"Movement Metrics", "Movement Metrics", "Invasive/Endangered Species",
"Movement Metrics", "Movement Metrics", "Movement Metrics", "Movement Metrics",
"Movement Metrics", "Invasive/Endangered Species", "Human Interaction",
"Invasive/Endangered Species", "Habitat Use", "Invasive/Endangered Species",
"Feeding/Behavior", "Movement Metrics", "Movement Metrics", "Invasive/Endangered Species",
"Feeding/Behavior", "Movement Metrics", "Habitat Use", "Movement Metrics",
"Habitat Use", "Movement Metrics", "Movement Metrics", "Habitat Use"
)), row.names = c(NA, 30L), class = "data.frame")
structure(list(Author = c("Aquiloni, L., et al.", "Aquiloni, L., et al.",
"Barbaresi, S., et al.", "Barbaresi, S., et al.", "Barbaresi, S., et al.",
"Bubb, D. H., et al.", "Bubb, D. H., et al.", "Bubb, D. H., et al.",
"Bubb, D. H., et al.", "Bubb, D. H., et al.", "Bubb, D. H., et al.",
"Gherardi, F., et al.", "Gherardi, F., et al.", "Gherardi, F., et al.",
"Hall et al.", "Holsman et al. ", "Holsman et al. ", "Louca et al.",
"Loughman et al.", "Loughman et al.", "Robinson et al.", "Robinson et al.",
"Smith B.D et al.", "Smith B.D et al."), Year = c(2005L, 2005L,
2004L, 2004L, 2004L, 2002L, 2002L, 2004L, 2004L, 2006L, 2006L,
2002L, 2002L, 2002L, 1991L, 2006L, 2006L, 2014L, 2013L, 2013L,
2000L, 2000L, 1991L, 1991L), Purpose = c("Habitat Use", "Invasive/Endangered Species",
"Feeding/Behavior", "Invasive/Endangered Species", "Movement Metrics",
"Invasive/Endangered Species", "Movement Metrics", "Invasive/Endangered Species",
"Movement Metrics", "Invasive/Endangered Species", "Movement Metrics",
"Feeding/Behavior", "Invasive/Endangered Species", "Movement Metrics",
"Movement Metrics", "Habitat Use", "Movement Metrics", "Human Interaction",
"Habitat Use", "Movement Metrics", "Invasive/Endangered Species",
"Movement Metrics", "Habitat Use", "Movement Metrics"), count = c(1L,
1L, 1L, 1L, 1L, 1L, 3L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 3L, 1L, 1L)), class = "data.frame", row.names = c(NA,
-24L))
第一个数据框显示作者、年份和研究目的(其中可以有多个)。但是,由于我创建数据的方式,存在一些重复项(即 Robinson 等人 2000 列出了 3 次“运动指标”,而我只希望它列出一次)。
我会使用duplicated
orunique
函数,但我原来的 DF 有更多非唯一的列。
因此,我创建了第二个按作者/年份/目的分组的数据框,这样三个变量的每个组合都有一个计数。有什么办法让我说:
如果 DF2$count > 1,则在 DF1 中找到匹配的行并删除 n(counts)-1 行。
一个例子:
“SomeFunction”标识 DF2 中计数 > 1 的行。
“SomeFunction”在 DF2 中获取作者和年份列并与 DF1 匹配
“SomeFunction”删除重复的行,为每个作者/年份/目的组合留下一行
解决方案
如果您的目标是删除重复的行,您可以直接在第一个数据帧上使用包中的distinct
函数:dplyr
library(dplyr)
df1 %>% distinct(Author, Year, Purpose, .keep_all = TRUE)
Author Year Purpose
1 Bubb, D. H., et al. 2006 Invasive/Endangered Species
2 Bubb, D. H., et al. 2006 Movement Metrics
3 Bubb, D. H., et al. 2002 Invasive/Endangered Species
4 Bubb, D. H., et al. 2002 Movement Metrics
5 Bubb, D. H., et al. 2004 Invasive/Endangered Species
6 Bubb, D. H., et al. 2004 Movement Metrics
7 Robinson et al. 2000 Movement Metrics
8 Robinson et al. 2000 Invasive/Endangered Species
9 Louca et al. 2014 Human Interaction
10 Aquiloni, L., et al. 2005 Invasive/Endangered Species
11 Aquiloni, L., et al. 2005 Habitat Use
12 Barbaresi, S., et al. 2004 Invasive/Endangered Species
13 Barbaresi, S., et al. 2004 Feeding/Behavior
14 Barbaresi, S., et al. 2004 Movement Metrics
15 Gherardi, F., et al. 2002 Movement Metrics
16 Gherardi, F., et al. 2002 Invasive/Endangered Species
17 Gherardi, F., et al. 2002 Feeding/Behavior
18 Loughman et al. 2013 Movement Metrics
19 Loughman et al. 2013 Habitat Use
20 Hall et al. 1991 Movement Metrics
21 Holsman et al. 2006 Habitat Use
22 Holsman et al. 2006 Movement Metrics
23 Smith B.D et al. 1991 Movement Metrics
24 Smith B.D et al. 1991 Habitat Use
添加.keep_all = TRUE
参数会保留所有其他列。
推荐阅读
- javascript - 优化在对象中查找嵌套属性的功能?
- r - 在 R 中创建新对象
- android - Libgdx - 如何在 lingdx 中使用 android ShareCompact 或 shareActionProvider
- android - .so 文件创建期间的链接器错误。错误:功能未实现
- sql - Accented letters and their base letter is having same ascci value
- reactjs - Add "Row Number" to list component in react-admin Datagrid
- .net - How to print directly without going to print dialog box in vb.net?
- c++ - 为什么我不能使用可选的 UnaryPredicate 参数创建模板函数?
- php - 如何在codeigniter中从前端动态设置index.php之前在url中添加一个文件名?
- javascript - 无法在 django 中加载引导导航栏(可能是明显的解决方案)