首页 > 解决方案 > 展开数据表,添加新行并按组替换 NA 值

问题描述

我正在尝试基于列(在下面的示例中Month)扩展数据表并按组()填充空值Group。举个例子dt

set.seed(0)
dt<-data.table(ID=c(1:10),Month=sample(1:10,replace = F),Group=c("A","B","C","A","B","C","A","B",'A','A'))
dt[1:4,":="(Income=rnorm(4),Tax=rnorm(4),Birth=sample(seq(as.POSIXct('2000/01/01'), as.POSIXct('2002/05/01'), by="day"), 4))]

我想扩展该表,以便每Group行有 10 行,这样每行的列值从 1 到 10。应根据现有行填充Month剩余的列 ( Income, Tax, )。BirthNA 应该采用最近的“月份”的值。因此,对于GroupA,数据表应该有 10 行,如下所示(即,最终数据表的每组总共应该有 10 行):

dt_desired<-data.table(
  ID=rep(1:10),
  Group=rep("A",10),
  Income=c(rep(dt[Group=='A'&Month==1]$Income,8),rep(dt[Group=='A'&Month==9]$Income,2)),
  Tax=c(rep(dt[Group=='A'&Month==1]$Tax,8),rep(dt[Group=='A'&Month==9]$Tax,2)),
  Birth=c(rep(dt[Group=='A'&Month==1]$Birth,8),rep(dt[Group=='A'&Month==9]$Birth,2))
  )

标签: rdatatable

解决方案


据我所知,data.table::nafill()无法处理非数字列(还没有?),所以我不得不zoo:na.locf()改用..

library( data.table )

#first, create CJ, then perform update join
ans <- CJ( Group = dt$Group, Month = dt$Month, unique = TRUE )[ dt, 
                                                 `:=`( Income = i.Income, Tax = i.Tax, Birth = i.Birth ),
                                                 on = .( Group, Month ) ]
#columns to fill NA's
cols = names(ans)[-(1:2)]
#for locf
ans[, (cols) := lapply( .SD, zoo::na.locf, na.rm = FALSE ), by = Group, .SDcols = cols]
#for  nocb
ans[, (cols) := lapply( .SD, zoo::na.locf, na.rm = FALSE, fromLast = TRUE ), by = Group, .SDcols = cols][]

推荐阅读