首页 > 解决方案 > 遍历行并计算与 R 中的多个条件匹配的行数

问题描述

我有一个如下所示的数据集:

        city period_day       date 
1  barcelona    morning 2017-01-15         
2  sao_paulo  afternoon 2016-12-07         
3  sao_paulo    morning 2016-11-16         
4  barcelona    morning 2016-11-06         
5  barcelona  afternoon 2016-12-31         
6  sao_paulo  afternoon 2016-11-30         
7  barcelona    morning 2016-10-15         
8  barcelona  afternoon 2016-11-30         
9  sao_paulo  afternoon 2016-12-24         
10 sao_paulo  afternoon 2017-02-02         

对于每一行,我想计算有多少行的日期早于该行的日期,包括 city 和 period_day。在这种情况下,我想要这个结果:

        city period_day       date row_count
1  barcelona    morning 2017-01-15         2
2  sao_paulo  afternoon 2016-12-07         1
3  sao_paulo    morning 2016-11-16         0
4  barcelona    morning 2016-11-06         1
5  barcelona  afternoon 2016-12-31         1
6  sao_paulo  afternoon 2016-11-30         0
7  barcelona    morning 2016-10-15         0
8  barcelona  afternoon 2016-11-30         0
9  sao_paulo  afternoon 2016-12-24         2
10 sao_paulo  afternoon 2017-02-02         3

当 row_count 等于 0 时,表示它是较旧的日期。

我想出了一个解决方案,但是处理更多数据的时间太长了。那是代码:

get_count_function <- function(df) {
  idx <- 1:nrow(df)

  count <- sapply(idx, function(x) {
    name_city <-
      df %>% select(city) %>% filter(row_number() == x) %>% pull()
    name_period <-
      df %>% select(period_day) %>% filter(row_number() == x) %>% pull()

    date_row <- df %>%
      select(date) %>%
      filter(row_number() == x) %>%
      pull()

    date_any_row <- df %>%
      filter(dplyr::row_number() != x,
             city == name_city,
             period_day == name_period) %>%
      select(date) %>%
      pull()

    how_many <- sum(date_row > date_any_row)

    return(how_many)

  })

  return(count)

}

我怎样才能让这个功能更有效?

标签: rloopsdplyrmultiple-conditions

解决方案


试试这个:

library(tidyverse)

dat %>%
  group_by(city, period_day) %>%
  mutate(row_count = order(date) - 1) %>%
  ungroup()

当您调用order它时,它会返回索引,指向选定值组 ( )中值的顺序。从索引中date减去,您可以获得特定组中当前值之前的值的计数。例如,如果它是分钟。组中的值,它有索引,所以在它之前没有任何东西(),如果索引是- 只有一个值在它之前(在它之前一个旧值)等等。111 - 1 = 02date

数据:

dat <- read.table(
  text = "        city period_day       date
  barcelona    morning 2017-01-15
  sao_paulo  afternoon 2016-12-07
  sao_paulo    morning 2016-11-16
  barcelona    morning 2016-11-06
  barcelona  afternoon 2016-12-31
  sao_paulo  afternoon 2016-11-30
  barcelona    morning 2016-10-15
  barcelona  afternoon 2016-11-30
  sao_paulo  afternoon 2016-12-24
  sao_paulo  afternoon 2017-02-02",
  header = T,
  colClasses = c("character", "character", "Date")
)

推荐阅读