首页 > 解决方案 > How to count number of rows in data.table using same condition on multiple columns in R

问题描述

If I have the below data table in R

dt <- data.table(x = c("No", "No", "Yes")
              ,  y = c("No", "Yes", "Yes")
              ,  z = c("No", "No", "No")
              ,  q = c("Hi", "Where, "When")
              ,  AA = c(1, 1,1)
              ,  ID = c(999, 789, 567)
              , bop = c(2345, 2222,2222))

and I want to get the count of "No" in each variable(x, y, z) in below format

Variable Type Count

 x     No      2

 y     No     1

 z     No     3

I am not able to pass the same filter to the columns I want. I can only pass filter to one column

dt[, .(.N), by = .(x,y,z)]


标签: rdata.table

解决方案


We can use melt to reshape into 'long' format, and then do a group by 'variable' (i.e. the column name column), get the number of 'No' by summing the logical vector (value == 'No')

library(data.table)
melt(dt, id.var = 'q')[, .(Type = 'No', Count = sum(value == 'No')), .(variable)]
#   variable Type Count
#1:        x   No     2
#2:        y   No     1
#3:        z   No     3

For the new example

melt(dt[, .(x, y, z)], measure = c('x', 'y', 'z'))[, 
   .(Type = 'No', Count = sum(value == 'No')), .(variable)]
#    variable Type Count
#1:        x   No     2
#2:        y   No     1
#3:        z   No     3

Or specify the logical condition in i and use .N

melt(dt[, .(x, y, z)], measure = c('x', 'y', 'z'))[value == 'No',
       .(Type = first(value), Count = .N), variable]
#   variable Type Count
#1:        x   No     2
#2:        y   No     1
#3:        z   No     3

Or without melting we can loop over the Subset of Data.table (.SD), create a logical vector and reduce the list to a single vector with Reduce

dt[, .(variable = names(.SD), Type = 'No', 
     Count = Reduce(`+`, lapply(.SD, `==`, 'No'))), .SDcols = x:z]
#  variable Type Count
#1:        x   No     3
#2:        y   No     2
#3:        z   No     1

Or using rowSums

dt[, .(Variable = names(.SD), Type = 'No',
        Count = rowSums(.SD == 'No')), .SDcols = x:z]

In the OP's code, the grouping columns were taken as 'x', 'y', 'z'. Then, if we do .N, it is looking for the number of rows for each unique set of values across those columns and this will be 1 for each case because there is only 1 row for No No No or No Yes No or Yes Yes No


推荐阅读