首页 > 解决方案 > 如何检查数据框中的序列是否包含时间模式?



> df
   machine    error      time
   <fct>      <fct>      <int>    
 1 M_000001   2          10    
 2 M_000001   50         120     
 3 M_000002   109        30    
 4 M_000002   40         30    
 5 M_000002   2          30 
 1 M_000002   65         34    
 2 M_000002   50         36     
 3 M_000002   3          39    
 4 M_000002   99         39    
 5 M_000003   50         426  

时间模式由以下格式的一系列事件(错误)组成:(a)(b)(c)。有时会同时出现两个或多个错误:(ab) (c) (def)。此错误序列的元素保存在列表中的列表中的列表中:

> pattern_list[[1]]
[1] "(40" "109"  "2" 

[1] "65"

[1] "3"  "99)"


此外,每个模式都有一个时间注释。他们声明为了属于这个特定的时间模式,必须发生以下子序列中的时间段。格式为:[0, 5][2, 4]。对于序列 (a)(b)(c),这意味着: 模式以 'a' 开头;在 0 到 5 个时间单位后,必须发生错误“b”,在发生“b”后 2 到 4 个时间单位后,必须发生“c”。

> temp[[1]]
[1] "0"  "5"  "2"  "4"

注意 temp[[1]][1] 和 temp[[1]][2] 形成一个时间周期 [0, 5],就像 temp[[1]][3] 和 temp[[1]][ 4] 形成 [2, 4]。

最后,给定一个包含“机器”和“错误模式”列(列表元素)的数据框“输出”,我想要一个算法,它遍历数据框并为每个“机器”附加时间模式的 ID “模式”列。

> output
   machine     error_patterns
   <fct>       <list>        
 1 M_000001  <list [0]>    
 2 M_000002  <list [0]>    
 3 M_000003  <list [0]>    
 4 M_000004  <list [0]>    
 5 M_000005  <list [0]>    
 6 M_000006  <list [0]>    
 7 M_000007  <list [0]>    
 8 M_000008  <list [0]>    
 9 M_000009  <list [0]>    
10 M_000010  <list [0]>


df_g <- group_by(df, 'machine') 


for (l in 1:length(df_g$machine)){ #go through the whole dataset
  for (m in 1:length(pattern_list)) { #go through all patterns
  help2 = 0 #help variable for counting the n
    for (n in 1:length(pattern_list[[m]])) { #go through all subsequences of a pattern
    help1 = 0 #help variable for counting the k 
      for (k in 1:length(pattern_list[[m]][[n]])) { #go through each element of the subsequence
        if (grepl(df_g[[l, 'error']], pattern_list[[m]][[n]][k])){ #check whether the element is contained the pattern
          help1 = help1 + 1
          if (help1 == length(error_list[[m]][[n]])){ #check if all elements of the subsequence were contained within the grouped dataset
            help2 = help2 + 1
            if (help2 == length(error_list[[m]])){ #check if all subsequences of the pattern were contained within the grouped dataset
              output[[which(output[ , 'machine'] == df_g[l, 'machine']) , 'error_patterns']] <- append(output[[which(output[ , 'machine'] == df_g[l, 'machine']), 'error_patterns']], m) #fill the output-frame with the ID/index of the pattern, if its errors are contained



标签: rpattern-matchingtemporal

