首页 > 解决方案 > summarise() 中是否有一个 R 函数,我可以在其中计算两个相互重合的变量的数量?

问题描述

我不知道我应该如何表达这一点。

假设我有一个关于少年逮捕的数据框。其中一列描述了违反的法规,这就是我正在做的事情group_by()

date       sex statute_description
2015-01-01 M   ROBBERY
2015-01-01 M   ROBBERY
2015-01-01 F   ROBBERY
2015-01-01 F   ASSAULT - SIMPLE
2015-01-01 F   ASSAULT - SIMPLE
2015-01-01 M   ASSAULT - SIMPLE
2015-01-01 M   DRUG POSSESSION
2015-01-01 M   ISSUANCE OF WARRANT
2015-01-01 M   ISSUANCE OF WARRANT
2015-01-01 M   ISSUANCE OF WARRANT

arrest_reasons <- group_by(df, statute_description) %>%
   summarize(
      num_arrests = n
   )

这将返回:

statute_description  num_arrests
ROBBERY              3
DRUG POSSESSION      1
ASSAULT - SIMPLE     3
ISSUANCE OF WARRANT  3

我想要做的是在数据框中添加另一列,计算每种性别有多少成员犯下特定罪行。像这样:

statute_description  num_arrests  males  females
ROBBERY              3            2      1
DRUG POSSESSION      1            1      0
ASSAULT - SIMPLE     3            1      2
ISSUANCE OF WARRANT  3            3      0

我不确定什么功能适合这个。

标签: rdataframedplyr

解决方案


我们可以sum在逻辑表达式上使用

library(dplyr)
arrest_reasons <- group_by(df, statute_description) %>%
   summarize(
  num_arrests = n(), 
  males = sum(sex == 'M'), 
  females = num_arrests - males, .groups = 'drop')

arrest_reasons
# A tibble: 4 x 4
#  statute_description num_arrests males females
#  <chr>                     <int> <int>   <int>
#1 ASSAULT - SIMPLE              3     1       2
#2 DRUG POSSESSION               1     1       0
#3 ISSUANCE OF WARRANT           3     3       0
#4 ROBBERY                       3     2       1

数据

df <- structure(list(date = c("2015-01-01", "2015-01-01", "2015-01-01", 
"2015-01-01", "2015-01-01", "2015-01-01", "2015-01-01", "2015-01-01", 
"2015-01-01", "2015-01-01"), sex = c("M", "M", "F", "F", "F", 
"M", "M", "M", "M", "M"), statute_description = c("ROBBERY", 
"ROBBERY", "ROBBERY", "ASSAULT - SIMPLE", "ASSAULT - SIMPLE", 
"ASSAULT - SIMPLE", "DRUG POSSESSION", "ISSUANCE OF WARRANT", 
"ISSUANCE OF WARRANT", "ISSUANCE OF WARRANT")),
class = "data.frame", row.names = c(NA, 
-10L))

推荐阅读