首页 > 解决方案 > 从另一列 R 中按条件查找天数

问题描述

我有一个这样的数据框:

Ptt      Date                      Area        
88734    2016-10-23 05:39:18       BA             
88734    2016-10-23 06:53:13       BA           
88734    2016-11-09 08:32:18       MI            
88734    2016-11-19 06:45:27       MI           
88734    2016-12-20 12:30:43       MI           
88734    2016-12-29 02:45:35       FA             
129041   2017-10-05 04:55:24       BA            
129041   2016-10-23 06:45:30       MI            
129041   2016-11-16 07:10:32       FA            
129041   2016-11-29 03:43:54       FA           
120941   2017-01-02 14:54:39       FA           
...

Ptt我想计算每个人有多少天area,但我不知道该怎么做,有人知道吗?

我的预期是这样的:

Ptt      Date                      Area        Days
88734    2016-10-23 05:39:18       BA           1  
88734    2016-10-23 06:53:13       BA           1
88734    2016-11-09 08:32:18       MI           1 
88734    2016-11-19 06:45:27       MI           2
88734    2016-12-20 12:30:43       MI           3
88734    2016-12-29 02:45:35       FA           1  
129041   2017-10-05 04:55:24       BA           1 
129041   2016-10-23 06:45:30       MI           1 
129041   2016-11-16 07:10:32       FA           1 
129041   2016-11-29 03:43:54       FA           2
120941   2017-01-02 14:54:39       FA           3
...
dt = data.table(Ptt= c("88734", "88734", "88734", "88734", "88734", "88734", "120941", "120941","120941","120941","120941"),
                date = c("2016-10-23 05:39:18",
                         "2016-10-23 06:53:13 ", 
                         "2016-11-09 08:32:18",
                         "2016-11-19 06:45:27",
                         "2016-12-20 12:30:43",
                         "2016-12-29 02:45:35",
                         "2017-10-05 04:55:24",
                         "2016-10-23 06:45:30",
                         "2016-11-16 07:10:32",
                         "2016-11-29 03:43:54",
                         "2017-01-02 14:54:39"),
                Area = c("BA", "BA", "MI", "MI", "MI", "FA", "BA", "MI", "FA", "FA", "FA"))

编辑

我解释得不是很好。

Ptt所以,我想知道每个人有多少天Area 例如:88734有 1 天BA,3 天和MI1 天FA,等等。

我要这个:

Ptt      Area        Days
88734    BA           1
88734    MI           3 
88734    FA           1  
129041   BA           1 
129041   MI           1 
120941   FA           3

谢谢!

标签: rcountconditional-statementsdays

解决方案


您可以将date列转换为POSIXct并从中提取日期。对于每个PttArea您可以为每个日期分配一个唯一编号。

library(dplyr)

dt %>%
  mutate(date = lubridate::ymd_hms(date), 
         date1 = as.Date(date)) %>%
  group_by(Ptt, Area) %>%
  mutate(Days = dense_rank(date1)) %>%
  ungroup() %>%
  select(-date1)

#  Ptt    date                Area   Days
#   <chr>  <dttm>              <chr> <int>
# 1 88734  2016-10-23 05:39:18 BA        1
# 2 88734  2016-10-23 06:53:13 BA        1
# 3 88734  2016-11-09 08:32:18 MI        1
# 4 88734  2016-11-19 06:45:27 MI        2
# 5 88734  2016-12-20 12:30:43 MI        3
# 6 88734  2016-12-29 02:45:35 FA        1
# 7 120941 2017-10-05 04:55:24 BA        1
# 8 120941 2016-10-23 06:45:30 MI        1
# 9 120941 2016-11-16 07:10:32 FA        1
#10 120941 2016-11-29 03:43:54 FA        2
#11 120941 2017-01-02 14:54:39 FA        3

既然你有data.table,你也可以使用data.table语法来做到这一点:

library(data.table)

dt[, date := lubridate::ymd_hms(date)]
dt[, date1 := as.Date(date)]
dt[, Days := match(date1, unique(date1)), .(Ptt, Area)]

推荐阅读