首页 > 解决方案 > 在数据框中渲染 NA 计数

问题描述

我想使用以下规则创建一个函数来返回 n 值的类型(n 值是数据框的第 6 列):

# n-value types
missing : NA
n > 0.05 : 'n.s.'
0.05 >= n > 0.01 : '*'
0.01 >= n > 0.001 : '**'
0.001 >= n > 0.0001 : '***'
0.0001 >= n : '****'

数据的第一行如下所示:

         n.name    bMean    log2FoldChange    lfcSE        stat            pn         padj
        <fct>      <dbl>      <dbl>           <dbl>         <dbl>         <dbl>       <dbl>
469    TNFRSF1B  542.82545  -3.406411        0.2267235    -15.024517    5.07e-51    3.25e-48

我尝试了以下方法:

c.1 <- function(x){
  breaks <- c(0, 0.0001, 0.001, 0.01, 0.05, 1)
  stars <- c("****", "***", "**", "*", "n.s.")
  bins <- cut(x, breaks = breaks, labels = stars, include.lowest = TRUE)
  bins <- as.character(bins)
  list(p = x, stars = bins)
}
tab.1<-table(c.1(nav$pvalue))
apply(tab.1, 2, sum)

我几乎得到了我想要的:

*: 24 **:102 ***: 15 ****": 45 n.s.: 32

我有一些 NA 而不是数字,但我没有在输出中得到它们,所以我尝试了:

a1<-as.numeric("NA")
c.1 <- function(x){
  breaks <- c(0, 0.0001, 0.001, 0.01, 0.05, 1, a1)
  stars <- c("****", "***", "**", "*", "n.s.", "NA")
  bins <- cut(x, breaks = breaks, labels = stars, include.lowest = FALSE)
  bins <- as.character(bins)
  list(p = x, stars = bins)
}
tab.1<-table(c.1(nav$pvalue))
apply(tab.1, 2, sum)

我收到一个错误,如何让 NA 计数包含在输出中?

标签: rdataframe

解决方案


你可以使用case_when

library(dplyr)
#> 
#> Attachement du package : 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

c.1 <- function(n) case_when( n > 0.05 ~ 'ns',
                                n > 0.01 ~ '*',
                                n > 0.001 ~ '**',
                                n > 0.0001 ~ '***',
                                n >=0 ~ '****',
                                is.na(n) ~ 'missing')

set.seed(1)
n <- rgeom(10,.1)
n <- n / max(n) / 100
n[sample(1:10,2)]<-NA 
n
#>  [1] 0.0025000000 0.0012500000 0.0095833333 0.0012500000 0.0100000000
#>  [6]           NA 0.0062500000 0.0008333333 0.0083333333           NA
c.1(n)
#>  [1] "**"      "**"      "**"      "**"      "**"      "missing" "**"     
#>  [8] "***"     "**"      "missing"

df <- data.frame(n)

df %>% mutate(signif = c.1(n)) %>%
       select(signif,n) %>%
       group_by(signif) %>%
       summarize(nb = n()) %>%
       ungroup() 
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 3 x 2
#>   signif     nb
#>   <chr>   <int>
#> 1 **          7
#> 2 ***         1
#> 3 missing     2

reprex 包(v0.3.0)于 2020 年 10 月 1 日创建


推荐阅读