首页 > 解决方案 > 基于条件的R数据表分组并根据条件获取计数

问题描述

我有一个这样的数据表:

timestamp           type    status
05-01-2020 12:07:08    A      1
05-01-2020 12:36:05    A      1 
05-01-2020 13:34:25    A      1 
05-01-2020 23:45:02    A      1
05-01-2020 23:55:02    B      1
05-01-2020 13:44:33    B      2
06-01-2020 01:07:08    A      1 
06-01-2020 10:23:05    A      1
06-01-2020 12:11:08    A      2
06-01-2020 22:06:12    B      2
07-01-2020 00:01:05    A      2
07-01-2020 02:17:09    A      1
07-01-2020 12:36:05    B      1
07-01-2020 12:07:08    B      1
07-01-2020 12:36:05    A      1
07-01-2020 12:36:05    A      1
08-01-2020 12:36:05    B      2
08-01-2020 12:36:05    B      1
08-01-2020 12:36:05    B      1
09-01-2020 12:36:05    B      1 
09-01-2020 12:07:08    B      2
09-01-2020 12:36:05    B      1
11-01-2020 12:07:08    A      1
11-01-2020 12:36:05    A      1

我正在尝试将其按日期分组并使用rleid().

dt <- dt[, group_id := rleid(as.IDate(timestamp),type,status = 1)][]

现在我想得到两个计数。

一是统计每天每个组内满足条件的实例数。

date         type  count
05-01-2020    A      4
05-01-2020    B      1
06-01-2020    A      2
07-01-2020    A      3
07-01-2020    B      2
08-01-2020    B      2
09-01-2020    B      2
11-01-2020    A      2

第二个是找到每天满足条件的组数。

date         type  count
05-01-2020    A      1
05-01-2020    B      1
06-01-2020    A      1
07-01-2020    A      2
07-01-2020    B      1
08-01-2020    B      1
09-01-2020    B      2
11-01-2020    A      1

标签: rdata.table

解决方案


1) 统计每个组内每天满足条件的实例数。

library(data.table)
setDT(df)
df[, .(count = sum(status == 1)), .(timestamp, type)]

#    timestamp type count
#1: 05-01-2020    A     4
#2: 05-01-2020    B     1
#3: 06-01-2020    A     2
#4: 06-01-2020    B     0
#5: 07-01-2020    A     3
#6: 07-01-2020    B     2
#7: 08-01-2020    B     2
#8: 09-01-2020    B     2
#9: 11-01-2020    A     2

如果不需要,您可以删除 0 计数。


2)查找每天满足条件的组数。

count_N使用rleidoftypestatus和为每个andstatus = 1计数唯一值创建一个新列 ( ) 。timestamptype

df[, count_N := rleid(type, status), timestamp]
df[status == 1, .(count = uniqueN(count_N)), .(timestamp, type)]


#    timestamp type count
#1: 05-01-2020    A     1
#2: 05-01-2020    B     1
#3: 06-01-2020    A     1
#4: 07-01-2020    A     2
#5: 07-01-2020    B     1
#6: 08-01-2020    B     1
#7: 09-01-2020    B     2
#8: 11-01-2020    A     1

推荐阅读