首页 > 解决方案 > R - 根据多个条件和事件时间创建新列

问题描述

我需要根据以前列的多个条件和时间点创建新列。我有以下数据框:

table <- data.frame(RowID=c("A1", "A2", "A3", "A4", "A5", "A6", "A7", "A8", "A9", "A10", "A11", "A12", "A13", "A14", "A15"), Machine=c("Ace", "Ace", "Ace", "Ame", "Ame", "Cay", "Cay", "Cay", "Cay", "Cay", "Gap", "Gap", "Dex", "Dex", "Dex"), Time=c(1,2,3,1,2,1,2,3,4,5,1,2,1,2,3), Status=c("Good", "Good", "Bad", "Bad", "Good", "Good", "Bad", "Good", "Good", "Bad", "Good", "Good", "Bad", "Bad", "Good"))

print(table)
 RowID Machine Time Status
1     A1     Ace    1   Good
2     A2     Ace    2   Good
3     A3     Ace    3    Bad
4     A4     Ame    1    Bad
5     A5     Ame    2   Good
6     A6     Cay    1   Good
7     A7     Cay    2    Bad
8     A8     Cay    3   Good
9     A9     Cay    4   Good
10   A10     Cay    5    Bad
11   A11     Gap    1   Good
12   A12     Gap    2   Good
13   A13     Dex    1    Bad
14   A14     Dex    2    Bad
15   A15     Dex    3   Good

对于每台机器时间显示读取时间。我想创建两个新列VerdictOutcome。对于Verdict列,我想在“Bad”(例如 Ace 和 Cay)之前为任何具有“Good”状态的机器标记“YES”,否则标记“NO”。对于结果列,我想在机器第一次出现“坏”状态时标记“事件”,在“坏”状态出现之前为“好”状态标记“BeforeEvent”。对于不直接在“坏”之前的任何其他“好”状态,标记“之前”和第一个“坏”之后的任何状态

我希望得到的最终数据框如下:

table_new <- data.frame(RowID=c("A1", "A2", "A3", "A4", "A5", "A6", "A7", "A8", "A9", "A10", "A11", "A12", "A13", "A14", "A15"), Machine=c("Ace", "Ace", "Ace", "Ame", "Ame", "Cay", "Cay", "Cay", "Cay", "Cay", "Gap", "Gap", "Dex", "Dex", "Dex"), Time=c(1,2,3,1,2,1,2,3,4,5,1,2,1,2,3), Status=c("Good", "Good", "Bad", "Bad", "Good", "Good", "Bad", "Good", "Good", "Bad", "Good", "Good", "Bad", "Bad", "Good"), Verdict=c("YES", "YES", "YES", "NO", "NO", "YES", "YES", "YES", "YES", "YES", "NO", "NO", "NO", "NO", "NO"), Outcome=c("Before", "BeforeEvent", "Event", "None", "None", "BeforeEvent", "Event", "After", "After", "After", "None", "None", "None", "None", "None"))

print(table_new)
   RowID Machine Time Status Verdict     Outcome
1     A1     Ace    1   Good     YES      Before
2     A2     Ace    2   Good     YES BeforeEvent
3     A3     Ace    3    Bad     YES       Event
4     A4     Ame    1    Bad      NO        None
5     A5     Ame    2   Good      NO        None
6     A6     Cay    1   Good     YES BeforeEvent
7     A7     Cay    2    Bad     YES       Event
8     A8     Cay    3   Good     YES       After
9     A9     Cay    4   Good     YES       After
10   A10     Cay    5    Bad     YES       After
11   A11     Gap    1   Good      NO        None
12   A12     Gap    2   Good      NO        None
13   A13     Dex    1    Bad      NO        None
14   A14     Dex    2    Bad      NO        None
15   A15     Dex    3   Good      NO        None

非常感谢您对此的任何帮助,因为我需要多次重复此操作,因此如果可以自动化,那就太好了-谢谢!

标签: rif-statementdplyr

解决方案


这是一个尝试的示例(使用my_table)。目前尚不清楚您是否可能有多个从好到坏的过渡,或者您可能希望如何处理。

首先,group_by机器。我会考虑一个事件,其中前一行是“好”,当前行是“坏”。发生这种情况时,可以使用布尔值进行标记。

如果组中的任何值是TRUE针对事件的,则裁决将被标记为“是”,否则将被标记为“否”。

使用case_when您可以指示将机器组中的行号与事件第一次发生的时间进行比较的结果(min在组内有多个转换的情况下使用)。

我希望这对你有帮助。

library(dplyr)

my_table %>%
  group_by(Machine) %>%
  mutate(Event = ifelse(lag(Status, default = "Bad") == "Good" & Status == "Bad", TRUE, FALSE),
         Verdict = ifelse(any(Event), "YES", "NO"),
         Outcome = ifelse(Verdict == "NO", "None",
           case_when(
             row_number() + 1 < min(which(Event)) ~ "Before",
             row_number() < min(which(Event)) ~ "BeforeEvent",
             row_number() == min(which(Event)) ~ "Event",
             row_number() > min(which(Event)) ~ "After"
           )))

输出

   RowID Machine  Time Status Event Verdict Outcome    
   <chr> <chr>   <dbl> <chr>  <lgl> <chr>   <chr>      
 1 A1    Ace         1 Good   FALSE YES     Before     
 2 A2    Ace         2 Good   FALSE YES     BeforeEvent
 3 A3    Ace         3 Bad    TRUE  YES     Event      
 4 A4    Ame         1 Bad    FALSE NO      None       
 5 A5    Ame         2 Good   FALSE NO      None       
 6 A6    Cay         1 Good   FALSE YES     BeforeEvent
 7 A7    Cay         2 Bad    TRUE  YES     Event      
 8 A8    Cay         3 Good   FALSE YES     After      
 9 A9    Cay         4 Good   FALSE YES     After      
10 A10   Cay         5 Bad    TRUE  YES     After      
11 A11   Gap         1 Good   FALSE NO      None       
12 A12   Gap         2 Good   FALSE NO      None       
13 A13   Dex         1 Bad    FALSE NO      None       
14 A14   Dex         2 Bad    FALSE NO      None       
15 A15   Dex         3 Good   FALSE NO      None 

数据

my_table <- structure(list(RowID = c("A1", "A2", "A3", "A4", "A5", "A6", 
"A7", "A8", "A9", "A10", "A11", "A12", "A13", "A14", "A15"), 
    Machine = c("Ace", "Ace", "Ace", "Ame", "Ame", "Cay", "Cay", 
    "Cay", "Cay", "Cay", "Gap", "Gap", "Dex", "Dex", "Dex"), 
    Time = c(1, 2, 3, 1, 2, 1, 2, 3, 4, 5, 1, 2, 1, 2, 3), Status = c("Good", 
    "Good", "Bad", "Bad", "Good", "Good", "Bad", "Good", "Good", 
    "Bad", "Good", "Good", "Bad", "Bad", "Good")), class = "data.frame", row.names = c(NA, 
-15L))

推荐阅读