首页 > 解决方案 > 获取每列中的累积行数,不包括 R 中的 NA

问题描述

我有一个结构如下的数据框:

structure(list(CT_CW.QA.RWL.H1A1Y = c(NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), CT_CW.QA.RWL.H1A1Z = c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), 
CT_CW.QA.RWL.H1A2Y = c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), CT_CW.QA.RWL.H1A2Z = c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), CT_CW.QA.RWL.H1A3Y = c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), CT_CW.QA.RWL.H1A3Z = c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), CT_CW.QA.RWL.H1A4Y = c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), CT_CW.QA.RWL.H1A4Z = c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), CT_CW.QA.RWL.H1A5Y = c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), CT_CW.QA.RWL.H1A5Z = c(1.07, 0.41, 0.87, 1.21, 0.99, 0.77, 
0.73, 0.77, 0.61, 0.89), CT_CW.QA.RWL.H1A6Y = c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_), CT_CW.QA.RWL.H1A6Z = c(NA, 
NA, 0.92, 0.64, 0.63, 0.48, 0.17, 0.28, 0.32, 0.64), CT_CW.QA.RWL.H1A7Y = c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_), CT_CW.QA.RWL.H1A7Z = c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_), CT_CW.QA.RWL.H1A8Y = c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_)), row.names = c("1812", "1813", 
"1814", "1815", "1816", "1817", "1818", "1819", "1820", "1821"
), class = "data.frame")

我想要做的是,对于每一列,获取不包括 NA 的行数的累积计数(此时有必要保留 NA)。

我尝试了以下方法(test上面的数据框在哪里):

test_count = cumsum(colSums(!is.na(test)))

但这似乎继续跨列计数,而我需要每列中行数的唯一累积计数,以便结果是一个看起来像的数据框(这仅供视觉参考,数字是由):

Row.Name    CT_CW.QA.RWL.H1A1Y    CT_CW.QA.RWL.H1A2Y
  1812              0                      0
  1813              0                      1
  1814              1                      2

...表明原始数据帧 ( test) 在 CT_CW.QA.RWL.H1A1Y 列的前两行中具有 NA,在 CT.CW.QA.RWL.H1A2Y 列的第一行中具有 NA假设不代表上面数据中的值,只是为了说明我正在寻找的结构)

标签: r

解决方案


我们可以用 循环遍历列lapply,转换为逻辑,执行cumsum并将输出分配回原始对象或原始对象的副本。确保使用[]来保留属性

test1 <- test
test1[] <- lapply(test, function(x) cumsum(!is.na(x)))

-输出

head(test1)
     CT_CW.QA.RWL.H1A1Y CT_CW.QA.RWL.H1A1Z CT_CW.QA.RWL.H1A2Y CT_CW.QA.RWL.H1A2Z CT_CW.QA.RWL.H1A3Y CT_CW.QA.RWL.H1A3Z CT_CW.QA.RWL.H1A4Y
1812                  0                  0                  0                  0                  0                  0                  0
1813                  0                  0                  0                  0                  0                  0                  0
1814                  0                  0                  0                  0                  0                  0                  0
1815                  0                  0                  0                  0                  0                  0                  0
1816                  0                  0                  0                  0                  0                  0                  0
1817                  0                  0                  0                  0                  0                  0                  0
     CT_CW.QA.RWL.H1A4Z CT_CW.QA.RWL.H1A5Y CT_CW.QA.RWL.H1A5Z CT_CW.QA.RWL.H1A6Y CT_CW.QA.RWL.H1A6Z CT_CW.QA.RWL.H1A7Y CT_CW.QA.RWL.H1A7Z
1812                  0                  0                  1                  0                  0                  0                  0
1813                  0                  0                  2                  0                  0                  0                  0
1814                  0                  0                  3                  0                  1                  0                  0
1815                  0                  0                  4                  0                  2                  0                  0
1816                  0                  0                  5                  0                  3                  0                  0
1817                  0                  0                  6                  0                  4                  0                  0
     CT_CW.QA.RWL.H1A8Y
1812                  0
1813                  0
1814                  0
1815                  0
1816                  0
1817                  0
....

有一个colCumsumsmatrixStats

library(matrixStats)
test1[] <- colCumsums(!is.na(test))

cumsum返回向量的问题是因为每列只有一个观察值,并且colSums它返回列总和的累积总和,而不是每列内的累积总和vectorcumsum


推荐阅读