r - 通过排除一系列字符串来处理子集时的空格
问题描述
我有一个看起来像这样的数据框:
Author ID Country Year
A 12345 US 2011
B 13254 Germany 2018
C 54952 Belgium 2005
D 58774 UK 2009
E 88569 Lebanon 2015
...
我想排除所有属于欧盟和美国的国家。但是,我在包含空格的国家/地区遇到问题,例如捷克共和国和英国。
到目前为止我已经尝试使用
non_other_countries<-c("Belgium", "Bulgaria", "Demnark", "Germany", "Estonia", "Finland", "France", "Greece", "Ireland", "Italy", "Croatia", "Latvia", "Lithuania", "Luxembourg", "Malta", "Netherlands", "Austria", "Poland", "Portugal", "Romania", "Slovakia", "Slovania", "Spain", "Sweden", "Czech Republic", "Hungary", "United Kingdom", "Cyprus", "United States")
other_post_2011 <- other_post_2011_with_id[, setdiff(names(other_post_2011_with_id), non_other_countries)]
和
other_post_2011 <- subset(other_post_2011_with_id, ! Country %in% c("Belgium", "Bulgaria", "Demnark", "Germany", "Estonia", "Finland", "France", "Greece", "Ireland", "Italy", "Croatia", "Latvia", "Lithuania", "Luxembourg", "Malta", "Netherlands", "Austria", "Poland", "Portugal", "Romania", "Slovakia", "Slovania", "Spain", "Sweden", "Czech Republic", "Hungary", "United Kingdom", "Cyprus", "United States", "USA"))
但是,两者都无法排除包含空格的国家/地区。
我现在开发了一个(imo)非常丑陋的解决方案,将所有捷克共和国替换为捷克共和国,将所有英国替换为英国
other_post_2011_with_id$Country[other_post_2011_with_id$Country == "Czech Republic"] <- "Czechia"
other_post_2011_with_id$Country[other_post_2011_with_id$Country == "United Kingdom"] <- "UK"
但我一直想知道是否还有其他更优雅、更通用的解决方案。非常感谢!
解决方案
由于您提供的数据不完整,因此不知道您的代码到底出了什么问题,但请尝试以下方法。
head(dat)
# a id country year
# 1 a 1 United Kingdom 2006
# 2 b 5 Bouvet Island 2010
# 3 c 8 Hungary 2010
# 4 d 10 Czech Republic 2004
# 5 e 12 Bouvet Island 2001
# 6 f 19 United Kingdom 2004
excl <- c("Czech Republic", "Hungary", "United Kingdom", "Cyprus",
"United States")
dat[!dat$country %in% excl, ]
# a id country year
# 2 b 5 Bouvet Island 2010
# 5 e 12 Bouvet Island 2001
# 7 g 20 Dominica 2004
# 9 i 32 Namibia 2000
# 10 j 34 Bouvet Island 2011
# 11 k 35 Bouvet Island 2001
# 12 l 52 Bouvet Island 2010
# 13 m 54 Dominica 2005
# 14 n 56 Namibia 2000
# 17 q 77 Bouvet Island 2001
# 18 r 79 Qatar 2011
# 19 s 82 Bouvet Island 2002
数据
dat <- structure(list(a = structure(1:20, .Label = c("a", "b", "c",
"d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p",
"q", "r", "s", "t"), class = "factor"), id = c(1L, 5L, 8L, 10L,
12L, 19L, 20L, 31L, 32L, 34L, 35L, 52L, 54L, 56L, 61L, 67L, 77L,
79L, 82L, 90L), country = structure(c(8L, 1L, 5L, 3L, 1L, 8L,
4L, 2L, 6L, 1L, 1L, 1L, 4L, 6L, 5L, 2L, 1L, 7L, 1L, 3L), .Label = c("Bouvet Island",
"Cyprus", "Czech Republic", "Dominica", "Hungary", "Namibia",
"Qatar", "United Kingdom"), class = "factor"), year = c(2006L,
2010L, 2010L, 2004L, 2001L, 2004L, 2004L, 2009L, 2000L, 2011L,
2001L, 2010L, 2005L, 2000L, 2001L, 2006L, 2001L, 2011L, 2002L,
2003L)), class = "data.frame", row.names = c(NA, -20L))
推荐阅读
- kubernetes - Minikube portainer externalName 不工作
- linux - shopt -s nullglob 意外影响读取
- python - 对数据框中的数据进行分类
- python - 如何在使用来自父对象的数据时更改父/子类结构的不同方法调用的代码
- applescript - 使用 AppleScript 运行脚本后如何关闭终端?
- javascript - 如何在 div 中选择选中的输入以显示在另一个 div 中?(没有javascript)
- c# - 如何使用 C# XDocument 读取 xml 文件?
- nats.io - 使用 NATS 流媒体服务器我可以拥有多少个频道?
- flutter - 如何使用颤动中的按钮更改火灾存储中的数据?
- html - 如何为多个链接使用伪类?