首页 > 解决方案 > R中的子集数据,其中一列必须有3种可能性

问题描述

我有一个 data.frame,它有 3 列:代理名称、分类(A、B、C 或 D)和周数,即 Week1、Week2 等。

代理可以在多周内具有 4 种分类之一。目前我有长达 10 周的数据。

我想制作一个以“A”为分类的子集,并且在所有内容中都可用:第 8 周、第 9 周和第 10 周(最近 3 周)。

目前我已经建立了这个功能来达到预期的效果: -

cautionAgentsLocator = function(classification){

cautionAgents = NA

if(classification == "Bad"){

 cautionAgents = combData[combData$ABCD.Categorization == "D", ]

 cautionAgents = cautionAgents[cautionAgents$Weeks == "Week8" | cautionAgents$Weeks == "Week9" | cautionAgents$Weeks == "Week10", ]

cautionAgents = cautionAgents[, c("Agent.Name", "SPD", "Normalized.Distribution", "ABCD.Categorization", "Weeks")]

}

if(classification == "Good"){

cautionAgents = combData[combData$ABCD.Categorization == "A", ]

cautionAgents = cautionAgents[cautionAgents$Weeks == "Week8" | cautionAgents$Weeks == "Week9" | cautionAgents$Weeks == "Week10", ]

cautionAgents = cautionAgents[, c("Agent.Name", "SPD", "Normalized.Distribution", "ABCD.Categorization", "Weeks")]

}


uniqueName = unique(cautionAgents$Agent.Name)

for(i in uniqueName){
count = nrow(cautionAgents[cautionAgents$Agent.Name == i, ])
missingWeeks = setdiff(c("Week8", "Week9", "Week10"), cautionAgents$Weeks[cautionAgents$Agent.Name == i])
if(count < 3){
  cautionAgents = cautionAgents[-which(cautionAgents$Agent.Name == i), ]
 }
}

这可以通过一行代码来实现,即使用dplyr或一些更好的技术的单个子集语句?

要创建一个数据块,这里是代码:-

structure(list(Agent.Name = c("Christy Deruise", "Allen Voorhees", 
"Daniel Gonzalez Gaviria", "Denise Bradley", "Shimron Larose", 
"Tiana Morman", "James Cagle Jr", "Vicki Smith", "Donna Paskett", 
"Joan Balde"), ABCD.Categorization = c("D", "D", "D", "D", "D", 
"D", "D", "D", "D", "D"), Weeks = structure(c(1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Week1", "Week2", "Week3", 
"Week4", "Week5", "Week6", "Week7", "Week8", "Week9", "Week10"
), class = "factor")), row.names = c(NA, 10L), class = "data.frame")

但是当然,实际数据大约有 4000 行,其中每个代理在多个星期内都存在,每周都有不同的分类。

标签: rdplyrsubset

解决方案


“像这样?

library(dplyr)
combData %>%
    filter(ABCD.Categorization == "A", Weeks %in% c("Week8", "Week9", "Week10")) %%>
    select(Agent.Name, ABCD.Categorization, Weeks)

推荐阅读