r - 计算同时满足两个条件的行数
问题描述
我正在进行一项调查,参与者回答第一个问题是或否,然后是第二个开放式问题“如果是,为什么?”
我需要找出回答“是”后回答第二个问题的人的百分比。或者,我需要在他们回答“是”后找到“NA”的数量。
这是一个外观相似的数据集:
#> helpful helpfulhow
#> 1 n NA
#> 2 y Because this study cannot be put online. Thus I have to create a random wall of text
#> 3 n NA
#> 4 y This is a confidential study. Thus the data must be changed.
#> 5 n NA
#> 6 n NA
#> 7 y This is a confidential study. Thus the data must be changed every time.
#> 8 y NA
#> 9 y Qualitative studies are difficult to assess. Here is a random wall of text.
> str(b)
'data.frame': 9 obs. of 2 variables:
$ helpful : Factor w/ 2 levels "n","y": 1 2 1 2 1 1 2 2 2
$ helpfulhow: Factor w/ 4 levels "Because this study cannot be put online. Thus I have to create a random wall of text.",..: NA 1 NA 4 NA NA 3 NA 2
> dput(head(b))
structure(list(helpful = structure(c(1L, 2L, 1L, 2L, 1L, 1L), .Label = c("n",
"y"), class = "factor"), helpfulhow = structure(c(NA, 1L, NA,
4L, NA, NA), .Label = c("Because this study cannot be put online. Thus I have to create a random wall of text.",
"Qualitative studies are difficult to assess. Here is a random wall of text.",
"This is a confidential study. Thus the data must be changed every time.",
"This is a confidential study. Thus the data must be changed."
), class = "factor")), row.names = c(NA, 6L), class = "data.frame")
例如,我想知道有多少人把 'y's underhelpful
也把 'NA' under helpfulhow
。提前致谢。
解决方案
我制作了一个示例数据集,如下所示;在这里,我正在计算 Question-1 回答为“是”且 Question-2 为空(trimws
用于去除空格)或为NA
. 然后,除以总行数,我们得到分数。使用percent
from packagescales
我将其转换为百分比。
#> Name Q1 Q2
#> 1 Jerry Yes <NA>
#> 2 Beth No
#> 3 Jessica Yes
#> 4 Morty Yes Aww,Babola
#> 5 Summer No
#> 6 Rick Yes Wubbalubbadubdub
## percentage of people who answered yes to Q1 and also answered Q2
scales::percent(nrow(with(df,
df[Q1=="Yes" &
(trimws(Q2) != "" & !is.na(Q2)),]))/nrow(with(df,
df[Q1=="Yes",])))
#> [1] "50.0%"
数据:
df <- structure(list(Name = structure(c(2L, 1L, 3L, 4L, 6L, 5L),
.Label = c("Beth", "Jerry", "Jessica", "Morty", "Rick", "Summer"), class = "factor"),
Q1 = structure(c(2L, 1L, 2L, 2L, 1L, 2L),
.Label = c("No", "Yes"), class = "factor"),
Q2 = structure(c(NA, 1L, 2L, 3L, 1L, 4L),
.Label = c("", " ", "Aww,Babola", "Wubbalubbadubdub"), class = "factor")),
class = "data.frame", row.names = c(NA, -6L))
对于您的数据集,它将是这样的:
scales::percent(nrow(with(b, b[helpful=="y" & (trimws(helpfulhow) != "" & !is.na(helpfulhow)),]))/nrow(with(b, b[helpful=="y",])))
#> [1] "100%"
为了让它更干净,我们可以使用dplyr
包:
library(dplyr)
library(scales)
percent(
b %>%
filter(helpful == "y", !is.na(helpfulhow), trimws(helpfulhow) != "") %>%
nrow(.) / {b %>% filter(helpful == "y") %>% nrow(.)})
#> [1] "100%"
或者
b %>%
group_by(helpful) %>%
summarise(percent_helpfulhow = percent(sum(trimws(helpfulhow) != "" & !is.na(helpfulhow)) / n())) %>%
filter(helpful == "y") %>%
pull(2)
#> [1] "100%"
推荐阅读
- html - 当我必须排除导航栏时如何使 div 显示高度的 100%
- video - 如何根据 FFMpeg 编码命令或输出确定 AV1 文件的正确 HTML5 视频编解码器属性?
- python - ffmpeg 使用 python 将专辑封面艺术添加到 mp3
- apache - .htaccess - www 到非 www & http 到 https & 包括斜杠
- python - 为什么 pygame 没有降低这个实体的生命值?
- prolog - 我有prolog程序的maplist问题
- rest - 使用 REST 架构实现面向服务的架构
- python - 'fieldsets[4][1]' 的值必须是字典
- android - Google Play 控制台 API 问题
- python - Discord.py 赠品命令从反应错误中随机选择