首页 > 解决方案 > 用两种不同的条件过滤星期几

问题描述

我正在尝试整理我的日常活动数据(加速度计数据)。我想总结和平均我的数据的重复天数,但仅限于具有A2.Working > 6 hours. 另一个条件是一天必须有完整的 24 小时才能被包括为有效的一天。有效日期将包含这 3 个变量A1.NonWorking, A2.Working, A4.SleepWeek,它们的总和为 24 小时(例如,下面的数据Weekday 2没有 24 小时,因为加速度计是在这一天(星期二)附加的。这是一个可重复的示例:

df <- tibble(
 LbNr = c(22002,22002,22002,22002,
 22002,22002,22002,22002,22002,22002,22002,22002,22002,
 22002,22002,22002,22002,22002,22002,22002,22002,22002,
 22002,22002,22002,22002),
 Type = c("A2.Working","A1.NonWorking",
"A4.SleepWeek","A4.SleepWeek","A1.NonWorking","A2.Working",
"A1.NonWorking","A1.NonWorking","A4.SleepWeek","A1.NonWorking",
"A2.Working","A1.NonWorking","A4.SleepWeek","A4.SleepWeek",
"A1.NonWorking","A2.Working","A1.NonWorking","C0.Leisure",
"C4.SleepWeekend","C0.Leisure","C0.Leisure","C4.SleepWeekend",
"C0.Leisure","C4.SleepWeekend","A4.SleepWeek","A1.NonWorking"),
Weekday = c(2,2,2,3,3,3,3,4,4,4,4,4,4,5,5,5,5,6,6,6,7,7,7,7,1,1),
Time = c(9.83333,6.05,0.11667,6.83333,1.33333,
9.83333,6,0.03333,7.2,6.43333,5,5.23333,0.1,6.41667,0.96667,11.01667,
5.6,0.43333,7.9,15.66667,0.03333,7.91667,15.61667,0.43333,6.33333,0.66667))

我在没有选择特定日期的情况下尝试了此代码:

df %>% 
group_by(LbNr, Type, Weekday) %>%
summarise_all(.,sum) %>%
group_by(LbNr, Weekday) %>%
filter(any((Time >= 6 & Type == "A2.Working") | Weekday == 6 | Weekday == 7)) %>%
group_by(LbNr, Type) %>%
select(-Weekday) %>%
summarise_all(., mean, na.rm = TRUE)

但是,当我运行代码时,我得到以下信息:

LbNr   Type             Time
  <dbl> <chr>           <dbl>
1 22002 A1.NonWorking    6.65
2 22002 A2.Working      10.2 
3 22002 A4.SleepWeek     4.46
4 22002 C0.Leisure      15.9 
5 22002 C4.SleepWeekend  8.12

如果我对工作日求和(6.65 + 10.20 + 4.46 = 21.31),它会给我错误的结果,因为它对 和求和Weekday 2, 3, 4, 5A1.NonWorkingA4.SleepWeek

我想要一个返回此结果的代码:

LbNr   Type             Time
  <dbl> <chr>           <dbl>
1 22002 A1.NonWorking    6.95
2 22002 A2.Working      10.4 
3 22002 A4.SleepWeek     6.62
4 22002 C0.Leisure      15.9 
5 22002 C4.SleepWeekend  8.12

如果我总结工作日(6.95 + 10.40 + 6.62 = 23.97,几乎 24 小时),我会使用以下代码得到正确的结果:

df %>%
group_by(LbNr, Type, Weekday) %>%
summarise_all(.,sum) %>%
filter(Weekday %in% c('3':'7')) %>% 
group_by(LbNr, Weekday) %>%
filter(any((Time >= 6 & Type == "A2.Working") | Weekday == 6 | Weekday == 7)) %>%
group_by(LbNr, Type) %>%
select(-Weekday) %>%
summarise_all(., mean, na.rm = TRUE)

我用于filter(Weekday %in% c('3':'7'))这个志愿者是因为我知道Weekday 2没有 24 小时。我想要返回正确值的代码,而不需要像我一样的特定天数过滤器。只选择 24 小时的日子。

标签: rdplyrtidyverse

解决方案


您可以尝试使用any具有Time > 6Type的工作日进行过滤A2.Working。这可以在分组后起作用Weekday(保留Weekday如果符合条件的所有行)。还假设您将包括所有Weekdays6 和 7(似乎是周末)。这是你的想法吗?

library(dplyr)
df %>%
  group_by(LbNr, Type, Weekday) %>%
  summarise_all(.,sum) %>%
  group_by(LbNr, Weekday) %>%
  filter(any((Time > 6 & Type == "A2.Working") | Weekday == 6 | Weekday == 7)) %>%
  group_by(LbNr, Type) %>%
  select(-Weekday) %>%
  summarise_all(., mean, na.rm = TRUE)


   LbNr Type             Time
  <dbl> <chr>           <dbl>
1 22002 A1.NonWorking    7.27
2 22002 A2.Working      10.2 
3 22002 A4.SleepWeek     6.51
4 22002 C0.Leisure      15.9 
5 22002 C4.SleepWeekend  8.12

编辑:根据评论,如果您想确保在总时间约为 24 小时的情况下包含天数,您可以在按LbNrWeekday使用分组后添加过滤语句sum(Time)>23.9(使用接近 24 小时的阈值)。

这是结果代码,再次假设将包括第 6 天和第 7 天(无论是否在周末收集 24 小时数据)。我包括了有意义的逻辑——尽管它可以进一步简化(例如,如果 A2 的时间 < 6 小时,则不需要 24 小时的标准。总计)。希望这更接近您的需求。

library(dplyr)
# Edited to check for Weekdays 1:5 as having 24 hours (A1 + A2 + A4)
df %>%
  group_by(LbNr, Type, Weekday) %>%
  summarise_all(.,sum) %>%
  group_by(LbNr, Weekday) %>%
  filter(any(((Time >= 6 & Type == "A2.Working") & sum(Time) > 23.9)
              | Weekday == 6 | Weekday == 7)) %>%
  group_by(LbNr, Type) %>%
  select(-Weekday) %>%
  summarise_all(., mean, na.rm = TRUE)

推荐阅读