首页 > 解决方案 > 计算同时满足两个条件的行数

问题描述

我正在进行一项调查,参与者回答第一个问题是或否,然后是第二个开放式问题“如果是,为什么?”

我需要找出回答“是”后回答第二个问题的人的百分比。或者,我需要在他们回答“是”后找到“NA”的数量。

这是一个外观相似的数据集:

#>      helpful     helpfulhow               
#> 1    n           NA
#> 2    y           Because this study cannot be put online. Thus I have to create a random wall of text    
#> 3    n           NA         
#> 4    y           This is a confidential study. Thus the data must be changed.
#> 5    n           NA   
#> 6    n           NA
#> 7    y           This is a confidential study. Thus the data must be changed every time. 
#> 8    y           NA
#> 9    y           Qualitative studies are difficult to assess. Here is a random wall of text.
> str(b)
'data.frame':   9 obs. of  2 variables:
 $ helpful   : Factor w/ 2 levels "n","y": 1 2 1 2 1 1 2 2 2
 $ helpfulhow: Factor w/ 4 levels "Because this study cannot be put online. Thus I have to create a random wall of text.",..: NA 1 NA 4 NA NA 3 NA 2
> dput(head(b))
structure(list(helpful = structure(c(1L, 2L, 1L, 2L, 1L, 1L), .Label = c("n", 
"y"), class = "factor"), helpfulhow = structure(c(NA, 1L, NA, 
4L, NA, NA), .Label = c("Because this study cannot be put online. Thus I have to create a random wall of text.", 
"Qualitative studies are difficult to assess. Here is a random wall of text.", 
"This is a confidential study. Thus the data must be changed every time.", 
"This is a confidential study. Thus the data must be changed."
), class = "factor")), row.names = c(NA, 6L), class = "data.frame")

例如,我想知道有多少人把 'y's underhelpful也把 'NA' under helpfulhow。提前致谢。

标签: rexceldataframesubset

解决方案


我制作了一个示例数据集,如下所示;在这里,我正在计算 Question-1 回答为“是”且 Question-2 为空(trimws用于去除空格)或为NA. 然后,除以总行数,我们得到分数。使用percentfrom packagescales我将其转换为百分比。

#>      Name  Q1               Q2
#> 1   Jerry Yes             <NA>
#> 2    Beth  No                 
#> 3 Jessica Yes                 
#> 4   Morty Yes       Aww,Babola
#> 5  Summer  No                 
#> 6    Rick Yes Wubbalubbadubdub


## percentage of people who answered yes to Q1 and also answered Q2
scales::percent(nrow(with(df, 
                          df[Q1=="Yes" & 
                            (trimws(Q2) != "" & !is.na(Q2)),]))/nrow(with(df, 
                                                                          df[Q1=="Yes",])))

#> [1] "50.0%"

数据:

df <- structure(list(Name = structure(c(2L, 1L, 3L, 4L, 6L, 5L), 
                                      .Label = c("Beth", "Jerry", "Jessica", "Morty", "Rick", "Summer"), class = "factor"), 
                     Q1 = structure(c(2L, 1L, 2L, 2L, 1L, 2L), 
                                    .Label = c("No", "Yes"), class = "factor"), 
                     Q2 = structure(c(NA, 1L, 2L, 3L, 1L, 4L), 
                                    .Label = c("", "       ", "Aww,Babola", "Wubbalubbadubdub"), class = "factor")), 
                class = "data.frame", row.names = c(NA, -6L))

对于您的数据集,它将是这样的:

scales::percent(nrow(with(b, b[helpful=="y" & (trimws(helpfulhow) != "" & !is.na(helpfulhow)),]))/nrow(with(b, b[helpful=="y",])))

#> [1] "100%"

为了让它更干净,我们可以使用dplyr包:

library(dplyr)
library(scales)

percent(
  b %>% 
    filter(helpful == "y", !is.na(helpfulhow), trimws(helpfulhow) != "") %>% 
    nrow(.) / {b %>% filter(helpful == "y") %>% nrow(.)})

#> [1] "100%"

或者

b %>% 
  group_by(helpful) %>% 
  summarise(percent_helpfulhow = percent(sum(trimws(helpfulhow) != "" & !is.na(helpfulhow)) / n())) %>% 
  filter(helpful == "y") %>% 
  pull(2)

#> [1] "100%"

推荐阅读