首页 > 解决方案 > 替换一个数据集在另一个数据集中的出现

问题描述

我有一个名为 Messages 的数据集,其中包含 C# 错误。我有一个名为 Usernames 的第二个数据集,其中包含用户名列表。我想从消息中删除任何用户名的出现。任何消息都不应出现超过 1 次用户名。我以为我可以用 gsubfn 做到这一点,但它输出所有 NULL。有人可以告诉我最好的方法吗?

usrNm <- c(dataset2$username)
stripUsername <- function(x) {gsubfn(usrNm,'',x)}
noUsernames <- within(dataset,{Message=stripUsername(dataset$Message)})
+----------------------------------+----------------------------------+    +--------------+
| Message                          | Expected output                  |    | Username     |
+----------------------------------+----------------------------------+    +--------------+
| User: Mary.Jane sent bad data    | User:  sent bad data             |    | Mary.Jane    |
+----------------------------------+----------------------------------+    +--------------+
| Error occurred in System.Module. | Error occurred in System.Module. |    | Robert.Frost |
+----------------------------------+----------------------------------+    +--------------+
| Hello, world!                    | Hello, world!                    |    | BB.Wolf      |
+----------------------------------+----------------------------------+    +--------------+
| Tracing request by Robert.Frost! | Tracing request by !             |
+----------------------------------+----------------------------------+

标签: rpowerbi

解决方案


这是一种方法:

library(stringi)

stri_replace_all_fixed(dataset$Message, dataset2$Username, '', vectorize_all = FALSE)

输出

[1] "User:  sent bad data"             "Error occurred in System.Module."
[3] "Hello, world!"                    "Tracing request by !" 

数据

dataset <- data.frame(
  Message = c("User: Mary.Jane sent bad data", "Error occurred in System.Module.", "Hello, world!", "Tracing request by Robert.Frost!"),
  stringsAsFactors = FALSE
)

dataset2 <- data.frame(
  Username = c("Mary.Jane", "Robert.Frost", "BB.Wolf")
)

推荐阅读