r - 如何对不同数据帧的列之间的匹配进行for循环测试,然后保存到新数据帧
问题描述
我正在尝试在 R 中创建一个 for 循环,当一个数据帧(df2)中的列(“areaName2”)的值与列(“ISLAND” ) 来自不同的数据帧 (df1)。如果 df2 的第一列中没有匹配项,那么我希望它继续配对 df2 和 df1 中的第二组列(df2:"areaName1 和 df1:"ARCHIP")。同样,如果有匹配项,它应该打印在新的数据框中。如果再次没有匹配,那么我希望它在第三对列(df2:“Country”和 df1:“COUNTRY”)上移动。如果所有列df 2 是空白的,那么我想跳过该行。如果 df 2 的其中一列中有一些信息,但它与 df1 不匹配,我希望它以某种方式说明是否可能。
我做了一个df1、df2和结果的例子:
ID <- c(1,2,3,4,5, 6)
COUNTRY <- c("country1", 'country2', 'country3','country4', 'country5', 'country6')
ARCHIP <- c('archipelago1', 'archipelago2', 'archipelgao3', 'archipelago4', 'archipelago5', 'archipelago6')
ISLAND <- c('someisland1', 'someIsland2', 'someIsland3', 'someIsland4', 'someIsland5', 'someIsland6')
df1 <- data.frame(ID, COUNTRY, ARCHIP, ISLAND)
Sciname <- c("scientificName1", "scientificName2", "scientificName3", "scientificName4", "scientificName5", "scientificName6")
AreaName2 <- c("someIsland1", NA, "someIsland3", NA, NA, 'unrecognisableIsland')
AreaName1 <- c("archipelago1", "archipelago2", "archipelago3", NA, NA, 'archipelago6')
Country <- c("country1", "country2", "country3", 'country4', NA, 'country6')
df2 <- data.frame(Sciname, Country, AreaName1, AreaName2)
Species <- c("scientificName1","scientificName2", "scientificName3", "scientificName4", 'scientificName6')
Location <- c("someIsland1", "archipelago2", "someIsland3", 'country4', 'UNREGOGNISED')
results <- data.frame(Species, Location)
我在想我需要为每个列集做一些事情
for (i in df2$AreaName2) {
results[[i]] <- if(df2$AreaName2 %in% df1$ISLAND)
}
但我不确定如何使其适用于每组,或者如何使其通过几列运行 - 也许我应该为我希望匹配的每组列创建一个 for 循环?有任何想法吗?谢谢!
解决方案
# I like to use tidyverse :)
library(tidyverse)
# First, to create our datasets - (Thank you for providing sample data!)
# I've set this up in a slightly different way, in an attempt to keep our workspace clear.
# I've also used tibble in place of data.frame, to line up with the tidyverse approach.
df1 <- tibble( ID = seq(1:6),
COUNTRY = c("country1", 'country2', 'country3','country4', 'country5', 'country6'),
ARCHIP = c('archipelago1', 'archipelago2', 'archipelgao3', 'archipelago4', 'archipelago5', 'archipelago6'),
ISLAND = c('someIsland1', 'someIsland2', 'someIsland3', 'someIsland4', 'someIsland5', 'someIsland6'))
df2 <- tibble( Sciname = c("scientificName1", "scientificName2", "scientificName3", "scientificName4", "scientificName5", "scientificName6"),
Country = c("country1", "country2", "country3", 'country4', NA, 'country6'),
AreaName1 = c("archipelago1", "archipelago2", "archipelago3", NA, NA, 'archipelago6'),
AreaName2 = c("someIsland1", NA, "someIsland3", NA, NA, 'unrecognisableIsland'))
# Rather than use a for loop, I'll use full_join to match the two tables, then filter for the conditions you're looking for.
# Merge data
join_country <- full_join(df2, df1, by = c("Country" = "COUNTRY"))
# Identify scinames with matching island names
# I use _f to signify my goal here - filtering
island_f <- join_country %>%
filter(AreaName2 == ISLAND) %>%
# Keep only relevant columns
select(Sciname, Location = AreaName2)
# Identify scinames with matching archip names
archip_f <- join_country %>%
filter(
# Exclude scinames we've identified with matching island names.
!(Sciname %in% island_f$Sciname),
AreaName1 == ARCHIP) %>%
select(Sciname, Location = AreaName1)
# Identify scinames left over (countries already matched from full_join)
country_f <- join_country %>%
filter(
# Exclude scinames we've identified with matching island or archip names.
!(Sciname %in% island_f$Sciname),
!(Sciname %in% archip_f$Sciname)) %>%
select(Sciname, Location = Country)
sciname_location <- bind_rows(island_f,
archip_f,
country_f) %>%
arrange(Sciname)
# Finally, to identify records that are populated but don't match at all, we can use anti_join.
records_no_match <- anti_join(df1, df2, by = c("COUNTRY" = "Country"))
您可以从R for Data Science 第 13 章了解有关关系数据的更多信息。
请让我知道,如果你有任何问题!
推荐阅读
- javascript - jQuery scrollTop 动画怪异且无法控制的缓动
- julia - 进度条减慢循环
- html - 如何 替换角度 6 中动态字符串的标记
- javascript - 我们应该在页面 React js 中有多少个上下文提供者
- image - 如何在 MATLAB 中读取 RGB 原始文件?
- android - 在 Android Q 的外部存储中创建 App-Specific 文件夹
- neo4j - 计算一个列表中有多少个日期在两个日期之间
- ibm-mq - 将逻辑应用程序连接到 IBM Cloud 中的 MQ
- amazon-web-services - Terraform:AWS 将 NAT 网关附加到两个子网中的一个子网
- r - 使用 insertUI 添加几个闪亮模块的问题