首页 > 解决方案 > 如何根据唯一用户 ID 和特定事件类型创建 for 循环

问题描述

我有两个数据框:usersevents.

两个数据框都包含一个将事件链接到用户的字段。

如何创建一个for循环,让每个用户的唯一 ID 与特定类型的事件相匹配,然后将出现次数存储到用户(、、等)内的新列users$conversation_startedusers$conversation_missed

简而言之,它是一个条件 for 循环。

到目前为止,我有这个,但这是错误的:

for(i in users$id){
  users$conversation_started <- nrow(event[event$type = "conversation-started"])
}

如何做到这一点的一个例子是理想的。

这个想法是:

for(each user)
    find the matching user ID in events
    count the number of event types == "conversation-started"
    assign count value to user$conversation_started
end for

重要的提示:

type字段可以包含五个值之一,因此我需要能够有效地过滤type每个员工的每个值:

> events$type %>% table %>% as.matrix
                           [,1]
conversation-accepted          3120
conversation-already-accepted 19673
conversation-declined            27
conversation-missed             831
conversation-request          23427

数据框(请注意,这些是精简版,因为已删除机密信息):

users <- structure(list(`_id` = c("JTuXhdI4Ai", "iGIeCEXyVE", "6XFtOJh0bD", 
"mNN986oQv9", "9NI71KBMX9", "x1jH7t0Cmy"), language = c("en", 
"en", "en", "en", "en", "en"), registering = c(TRUE, TRUE, FALSE, 
FALSE, FALSE, NA), `_created_at` = structure(c(1485995043.131, 
1488898839.838, 1480461193.146, 1481407887.979, 1489942757.189, 
1491311381.916), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    `_updated_at` = structure(c(1521039527.236, 1488898864.834, 
    1527618624.877, 1481407959.116, 1490043838.561, 1491320333.09
    ), class = c("POSIXct", "POSIXt"), tzone = "UTC"), lastOnlineTimestamp = c(1521039526.90314, 
    NA, 1480461472, 1481407959, 1490043838, NA), isAgent = c(FALSE, 
    NA, FALSE, FALSE, FALSE, NA), lastAvailableTime = structure(c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), class = c("POSIXct", 
    "POSIXt"), tzone = ""), available = c(NA, NA, NA, NA, NA, 
    NA), busy = c(NA, NA, NA, NA, NA, NA), joinedTeam = structure(c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), class = c("POSIXct", 
    "POSIXt"), tzone = ""), timezone = c(NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_
    )), row.names = c("list.1", "list.2", "list.3", "list.4", 
"list.5", "list.6"), class = "data.frame")

events <- structure(list(`_id` = c("JKY8ZwkM1S", "CG7Xj8dAsA", "pUkFFxoahy", 
"yJVJ34rUCl", "XxXelkIFh7", "GCOsENVSz6"), expirationTime = structure(c(1527261147.873, 
NA, 1527262121.332, NA, 1527263411.619, 1527263411.619), class = c("POSIXct", 
"POSIXt"), tzone = ""), partId = c("d22bfddc-cd51-489f-aec8-5ab9225c0dd5", 
"d22bfddc-cd51-489f-aec8-5ab9225c0dd5", "cf4356da-b63e-4e4d-8e7b-fb63035801d8", 
"cf4356da-b63e-4e4d-8e7b-fb63035801d8", "a720185e-c300-47c0-b30d-64e1f272d482", 
"a720185e-c300-47c0-b30d-64e1f272d482"), type = c("conversation-request", 
"conversation-accepted", "conversation-request", "conversation-accepted", 
"conversation-request", "conversation-request"), `_p_conversation` = c("Conversation$6nSaLeWqs7", 
"Conversation$6nSaLeWqs7", "Conversation$6nSaLeWqs7", "Conversation$6nSaLeWqs7", 
"Conversation$bDuAYSZgen", "Conversation$bDuAYSZgen"), `_p_merchant` = c("Merchant$0A2UYADe5x", 
"Merchant$0A2UYADe5x", "Merchant$0A2UYADe5x", "Merchant$0A2UYADe5x", 
"Merchant$0A2UYADe5x", "Merchant$0A2UYADe5x"), `_p_associate` = c("D9ihQOWrXC", 
"D9ihQOWrXC", "D9ihQOWrXC", "D9ihQOWrXC", "D9ihQOWrXC", "D9ihQOWrXC"
), `_wperm` = list(list(), list(), list(), list(), list(), list()), 
    `_rperm` = list("*", "*", "*", "*", "*", "*"), `_created_at` = structure(c(1527264657.998, 
    1527264662.043, 1527265661.846, 1527265669.435, 1527266922.056, 
    1527266922.059), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    `_updated_at` = structure(c(1527264657.998, 1527264662.043, 
    1527265661.846, 1527265669.435, 1527266922.056, 1527266922.059
    ), class = c("POSIXct", "POSIXt"), tzone = "UTC"), read = c(TRUE, 
    NA, TRUE, NA, NA, NA), data.customerName = c("Shopper 109339", 
    NA, "Shopper 109339", NA, "Shopper 109364", "Shopper 109364"
    ), data.departmentName = c("Personal advisors", NA, "Personal advisors", 
    NA, "Personal advisors", "Personal advisors"), data.recurring = c(FALSE, 
    NA, TRUE, NA, FALSE, FALSE), data.new = c(TRUE, NA, FALSE, 
    NA, TRUE, TRUE), data.missed = c(0L, NA, 0L, NA, 0L, 0L), 
    data.customerId = c("84uOFRLmLd", "84uOFRLmLd", "84uOFRLmLd", 
    "84uOFRLmLd", "5Dw4iax3Tj", "5Dw4iax3Tj"), data.claimingTime = c(NA, 
    4L, NA, 7L, NA, NA), data.lead = c(NA, NA, FALSE, NA, NA, 
    NA), data.maxMissed = c(NA, NA, NA, NA, NA, NA), data.associateName = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_), data.maxDecline = c(NA, NA, NA, NA, NA, NA
    ), data.goUnavailable = c(NA, NA, NA, NA, NA, NA)), row.names = c("list.1", 
"list.2", "list.3", "list.4", "list.5", "list.6"), class = "data.frame")

更新:2018 年 9 月 21 日

该解决方案现在导致在NA函数结束时生成 -only 数据帧。当写入 .csv 时,这是我得到的(自然,Excel 将NA-values 显示为空白值):

在此处输入图像描述

我的数据源没有改变,我的脚本也没有改变。

这可能是什么原因造成的?

我的猜测是,这是一个无法预料的情况,0每个步骤都可能发生命中;因此,有没有办法添加0到那些没有任何命中而不是NA/ 空白值的情况?

有没有办法避免这种情况?

标签: rconditional

解决方案


基于提供的数据的新解决方案。

注意:由于您的数据在 中没有重叠_id,因此我将 更改为events$_id与 中相同users

简化示例数据:

users <- structure(list(`_id` = structure(c(4L, 3L, 1L, 5L, 2L, 6L), 
                                                     .Label = c("6XFtOJh0bD", "9NI71KBMX9", "iGIeCEXyVE", 
                                                                           "JTuXhdI4Ai", "mNN986oQv9", "x1jH7t0Cmy"), 
                                                     class = "factor")), .Names = "_id", 
                   row.names = c(NA, -6L), class = "data.frame")
events <- structure(list(`_id` = c("JKY8ZwkM1S", "CG7Xj8dAsA", "pUkFFxoahy", 
                                   "yJVJ34rUCl", "XxXelkIFh7", "GCOsENVSz6"), 
                         type = c("conversation-request", "conversation-accepted", 
                                  "conversation-request", "conversation-accepted", 
                                  "conversation-request", "conversation-request")), 
                    .Names = c("_id", "type"), class = "data.frame", 
                    row.names = c("list.1", "list.2", "list.3", "list.4", "list.5", "list.6"))
events$`_id` <- users$`_id`

> users
         _id
1 JTuXhdI4Ai
2 iGIeCEXyVE
3 6XFtOJh0bD
4 mNN986oQv9
5 9NI71KBMX9
6 x1jH7t0Cmy

> events
              _id                  type
list.1 JTuXhdI4Ai  conversation-request
list.2 iGIeCEXyVE conversation-accepted
list.3 6XFtOJh0bD  conversation-request
list.4 mNN986oQv9 conversation-accepted
list.5 9NI71KBMX9  conversation-request
list.6 x1jH7t0Cmy  conversation-request

我们可以使用我之前建议的相同方法,只是稍微增强一下。

首先,我们循环将每个 id 的每种类型的事件unique(events$type)存储table()在一个列表中:

test <- lapply(unique(events$type), function(x) table(events$`_id`, events$type == x))

然后我们将特定类型存储为列表中相应表的名称:

names(test) <- unique(events$type)

现在我们对表使用一个简单的循环for,并将信息存储在一个带有事件类型名称的新变量中:match()user$_idrownames

for(i in names(test)){
  users[, i] <- test[[i]][, 2][match(users$`_id`, rownames(test[[i]]))]
}

结果:

> users
         _id conversation-request conversation-accepted
1 JTuXhdI4Ai                    1                     0
2 iGIeCEXyVE                    0                     1
3 6XFtOJh0bD                    1                     0
4 mNN986oQv9                    0                     1
5 9NI71KBMX9                    1                     0
6 x1jH7t0Cmy                    1                     0

希望这可以帮助!


推荐阅读