首页 > 解决方案 > 在 R 中查找事件的顺序和顺序

问题描述

我有一些包含日期和名称的数据以及我想订购的任务,并计算出人们执行的任务的顺序以及彼此之间的任务流程。因此,非常简单,这里有一些示例数据。

    Name    Date        Food
    Fred    01/01/2018  Peanuts
    Jim     03/02/2018  Banana
    Barney  02/02/2018  Rice
    Fred    06/03/2018  Rice
    Barry   12/02/2018  Peanuts
    John    04/04/2018  Rice
    Jim     03/03/2018  Rice
    Fred    20/04/2018  Rice
    Den     12/02/2018  Banana
    Barney  04/05/2018  Banana
    Jim     05/06/2018  Rice
    John    06/07/2018  Peanuts
    Jim     30/06/2018  Banana
    Fred    05/05/2018  Rice

这给了我每个被命名的人吃指定食物的日期。我想知道的是每个人他们吃过的食物的完整清单以及他们吃的顺序。

我在 R 中使用了 order 函数并使用创建了一个从 1 到 nrow 的 seq 来获取排序,但我不知道如何为每个人获取这个。

我的第二步是我想创建一个流表和记录每个流的次数,因此最终结果将是这样的表。

  Flow                 count
  Peanuts to rice      1
  Peanuts to banana    0
  Peanuts to peanuts   0
  Rice to peanuts      1
  Rice to banana       2
  Rice to rice         3
  Banana to rice       1
  Banana to peanuts    0
  Banana to banana     0

谢谢

更新:

和以往一样,我越是涉足某事,我就越想对数据进行更改!

所以,下面提供的答案给了我想要的流程图——谢谢。现在我想做的是能够编辑我的原始数据框以删除我不感兴趣或不想分析的流实例。

因此,例如,我将如何从数据框中删除所有说从大米到花生或从香蕉到大米的流(不管人是谁)?

标签: r

解决方案


让您的数据框为dat,并假设:

  1. 它已按Date列按升序排序(或按您当前的Date方式在 内排序Name);
  2. Name并且Food是因子列。

## split by person; not to be messed up by "between person" flow
x <- split(levels(dat$Food)[dat$Food], dat$Name)

#$Barney
#[1] "Rice"   "Banana"
#
#$Barry
#[1] "Peanuts"
#
#$Den
#[1] "Banana"
#
#$Fred
#[1] "Peanuts" "Rice"    "Rice"    "Rice"   
#
#$Jim
#[1] "Banana" "Rice"   "Rice"   "Banana"
#
#$John
#[1] "Rice"    "Peanuts"

方法一

getFlow1 <- function (u) {
  if (length(u) == 1L) NULL
  else paste(u[-length(u)], u[-1], sep = " to ")
  }

Flow1 <- unlist(lapply(x, getFlow1), use.names = FALSE)
#[1] "Rice to Banana"  "Peanuts to Rice" "Rice to Rice"    "Rice to Rice"   
#[5] "Banana to Rice"  "Rice to Rice"    "Rice to Banana"  "Rice to Peanuts"

## maybe you can control the order of factor levels here
All_Flow <- outer(levels(dat$Food), levels(dat$Food), paste, sep = " to ")
Flow1 <- table("Flow" = factor(Flow1, levels = All_Flow))
#Flow
#  Banana to Banana  Peanuts to Banana     Rice to Banana  Banana to Peanuts 
#                 0                  0                  2                  0 
#Peanuts to Peanuts    Rice to Peanuts     Banana to Rice    Peanuts to Rice 
#                 0                  1                  1                  1 
#      Rice to Rice 
#                 3 

as.data.frame(Flow1)

#                Flow Freq
#1   Banana to Banana    0
#2  Peanuts to Banana    0
#3     Rice to Banana    2
#4  Banana to Peanuts    0
#5 Peanuts to Peanuts    0
#6    Rice to Peanuts    1
#7     Banana to Rice    1
#8    Peanuts to Rice    1
#9       Rice to Rice    3

方法2(我更喜欢这个)

getFlow2 <- function (u) {
  if (length(u) == 1L) NULL
  else cbind(u[-length(u)], u[-1])
  }

Flow2 <- do.call("rbind", lapply(x, getFlow2))
#     [, 1]     [, 2]     
#[1,] "Rice"    "Banana" 
#[2,] "Peanuts" "Rice"   
#[3,] "Rice"    "Rice"   
#[4,] "Rice"    "Rice"   
#[5,] "Banana"  "Rice"   
#[6,] "Rice"    "Rice"   
#[7,] "Rice"    "Banana" 
#[8,] "Rice"    "Peanuts"

Flow2 <- table("From" = Flow2[, 1], "To" = Flow2[, 2])
#         To
#From      Banana Peanuts Rice
#  Banana       0       0    1
#  Peanuts      0       0    1
#  Rice         2       1    3

as.data.frame(Flow2)
#     From      To Freq
#1  Banana  Banana    0
#2 Peanuts  Banana    0
#3    Rice  Banana    2
#4  Banana Peanuts    0
#5 Peanuts Peanuts    0
#6    Rice Peanuts    1
#7  Banana    Rice    1
#8 Peanuts    Rice    1
#9    Rice    Rice    3

推荐阅读