首页 > 解决方案 > 如何删除R中字符串中各个位置以特定字符(@)开头的任何内容?

问题描述

我有一列包含多条推文:

ID | Tweet
1    @ChipotleTweets @ChipotleTweets Becky is very nice
2    Happy Halloween! I now look forward to $3 booritos at @ChipotleTweets
3    Considering walking to @.ChipotleTweets in my llama onesie.

目标是删除 '@___' 和 @ 之后的任何内容 - 但不是该字符串之外的文本。

目前正在玩这个代码来检测'@',但如果它不在句子的第一个位置,我什么也不会捡起

tweet_pattern <- " @\\w+"

Customer <- Customer %>% 
           clean_Tweet = ifelse(str_detect(text, tweet_pattern), 
                                str_remove(text, tweet_pattern), 
                                NA_character_))

期望的输出:

ID | Tweet                                                                  | cleaned_tweet 
1    @ChipotleTweets @ChipotleTweets Becky is very nice                       Becky is very nice
2    Happy Halloween! I now look forward to $3 booritos at @ChipotleTweets    Happy Halloween! I now look forward to $3 booritos at
3    Considering walking to @.ChipotleTweets in my llama onesie.              Considering walking to in my llama onesie.

标签: rregexdplyr

解决方案


我们可以更改模式以匹配零个或多个空格(\\s*),然后是@一个或多个非空格(\\S+str_remove_all以删除这些子字符串

library(stringr)
library(dplyr)
Customer %>%
     mutate(Cleaned_Tweet = str_remove_all(Tweet, "\\s*@\\S+"))

-输出

 ID                                                                 Tweet                                         Cleaned_Tweet
1  1                    @ChipotleTweets @ChipotleTweets Becky is very nice                                    Becky is very nice
2  2 Happy Halloween! I now look forward to $3 booritos at @ChipotleTweets Happy Halloween! I now look forward to $3 booritos at
3  3           Considering walking to @.ChipotleTweets in my llama onesie.            Considering walking to in my llama onesie.

注意:str_remove只删除匹配的第一个实例,即如果单个字符串中有多个匹配项,它会跳过其他匹配项并仅匹配第一个。我们需要str_remove_all删除匹配模式的所有实例。

数据

Customer <- structure(list(ID = 1:3, Tweet = c("@ChipotleTweets @ChipotleTweets Becky is very nice", 
"Happy Halloween! I now look forward to $3 booritos at @ChipotleTweets", 
"Considering walking to @.ChipotleTweets in my llama onesie."
)), class = "data.frame", row.names = c(NA, -3L))

推荐阅读