r - 如何 gsub 匹配字符串并同时删除不匹配的字符串?
问题描述
我有一个包含一列字符串的数据框,我想将其进一步标记为以下类别:城市、国家和大陆。我使用 gsub 将所有城市替换为“City”,将所有国家替换为“Country”,将所有大陆替换为“Continent”。
#This is what I have
dataframe
Color Letter Words
red A Paris,Asia,parrot,Antarctica,North America,cat,lizard
blue A Panama,New York,Africa,dog,Tokyo,Washington DC,fish
red B Copenhagen,bird,USA,Japan,Chicago,Mexico,insect
blue B Israel,Antarctica,horse,South America,North America,turtle,Brazil
#This is what I want
dataframe
Color Letter New
red A City,Continent
blue A Country,City,Continent
red B City,Country
blue B Country,Continent
#This is the code I have so far
dataframe$New <- NA
#groups all the cities
dataframe$New <- lapply)dataframe$Words, function(x) {
gsub("Paris|New York|Tokyo|Washington DC|Copenhagen|Chicago", "City", x)})
#groups all the countries
dataframe$New <- lapply)dataframe$Words, function(x) {
gsub("Panama|USA|Japan|Mexico|Israel|Brazil", "Country", x)})
#groups all the continents
dataframe$New <- lapply)dataframe$Words, function(x) {
gsub("Asia|Antarctica|Africa|North America|South America", "Continent", x)})
dataframe$Words <- NULL
如何防止每次都覆盖 dataframe$New 以及如何删除多余的单词(即鱼、马、猫)?
上面的数据是一个基于非常大的数据集的例子。在数据集中,单词列有很多重复。有关 dataframe$Words 中的一些示例行,请参见下文:
Words
Panama,Paris
Panama,Israel,cat
Panama,Paris,horse,
Panama,Asia
Panama
Panama,Chicago
Israel,Chicago
Israel,lizard,Paris
Israel,Panama,horse,Africa
```
解决方案
考虑粘贴几个ifelse
检查特定字符串的调用:
dataframe$New <- paste(ifelse(grepl("Paris|New York|Tokyo|Washington DC|Copenhagen|Chicago", dataframe$Words), "City", "N/A"),
ifelse(grepl("Panama|USA|Japan|Mexico|Israel|Brazil", dataframe$Words), "Country", "N/A"),
ifelse(grepl("Asia|Antarctica|Africa|North America|South America", dataframe$Words), "Continent", "N/A"),
sep=",")
dataframe$New <- gsub("N/A,|,N/A", "", dataframe$New)
dataframe
# Color Letter Words New
# 1 red A Paris,Asia,parrot,Antarctica,North America,cat,lizard City,Continent
# 2 blue A Panama,New York,Africa,dog,Tokyo,Washington DC,fish City,Country,Continent
# 3 red B Copenhagen,bird,USA,Japan,Chicago,Mexico,insect City,Country
# 4 blue B Israel,Antarctica,horse,South America,North America,turtle,Brazil Country,Continent
do.call
或带有+的烘干机版本lapply
:
strs <- list(c("Paris|New York|Tokyo|Washington DC|Copenhagen|Chicago", "City"),
c("Panama|USA|Japan|Mexico|Israel|Brazil", "Country"),
c("Asia|Antarctica|Africa|North America|South America", "Continent"))
df$New2 <- do.call(paste,
c(lapply(strs, function(s) ifelse(grepl(s[1], df$Words), s[2], "N/A")),
list(sep=",")))
df$New2 <- gsub("N/A,|,N/A", "", df$New2)
推荐阅读
- asp.net - MvcBuildViews 不适用于 x64 项目
- html - 添加垂直分隔线
- 创建具有单独活动和悬停状态的“部分”
- php - Ajax Post 数据然后返回结果
- angular - 如何使用prime ng添加文本“ON”和“OFF”以切换按钮
- vim - 处理 vimrc 时检测到错误:-如何在 vim 中获取 pugins
- typescript - 为 TypeScript 使用 JSTS 库类型绑定
- c++ - 如何使用 tensorflow 2.0 C API?
- javascript - 如何使用 momentjs 将一天添加到我的变量中
- angular - [Angular 10][Rxjs] 为什么第二个管道没有触发
- python - 我在 Power BI 中使用 Python 脚本。如何格式化多个 seaborn 'displot' 的 x 轴刻度标签和标题