首页 > 解决方案 > 使用 dcast 然后 melt 不会产生我开始使用的表(data.table 方法)

问题描述

我有 data.frame (对不起,它看起来很丑):

sample <- data.table(
  b001 = c(0, NA, 0, 0, NA, NA, 0, 0, NA, NA, 0, 0, NA, NA, 0, 0, NA, 86802296, 0, NA), 
  b002 = c(521566495, NA, 0, 515381816, NA, NA, 0, 502929725, NA, NA, 0, 501976304, NA, NA, 0, 1001600997, NA, 48172014, 1053789723, NA), 
  b003 = c(21632941, NA, 0, 24179514, NA, NA, 0, 23526136, NA, NA, 0, 23840002, NA, NA, 0, 136221414, NA, 90857983, 136974712, NA), 
  b004 = c(0, NA, 0, 0, NA, NA, 0, 0, NA, NA, 0, 0, NA, NA, 0, 62190678, NA, 299000, 55708960, NA), 
  b005 = c(21079801, NA, 0, 23467074, NA, NA, 0, 22694996, NA, NA, 0, 23082002, NA, NA, 0, 3435190, NA, 0, 3011353, NA), 
  b006 = c(0, NA, 0, 0, NA, NA, 0, 0, NA, NA, 0, 0, NA, NA, 0, 25431844, NA, -382404, 26127224, NA), 
  b007 = c(229500, NA, 0, 0, NA, NA, 0, 0, NA, NA, 0, 0, NA, NA, 0, 20327, NA, 10224057, 34791, NA), 
  b008 = c(323640, NA, 0, 712440, NA, NA, 0, 831140, NA, NA, 0, 758000, NA, NA, 0, 33739621, NA, 2991979, 40685611, NA), 
  b009 = c(0, NA, 0, 0, NA, NA, 0, 0, NA, NA, 0, 0, NA, NA, 0, 11403754, NA, 23861043, 11406773, NA), 
  b010 = c(499168717, NA, 0, 490437465, NA, NA, 0, 478638752, NA, NA, 0, 477371465, NA, NA, 0, 765852353, NA, -79679644, 808923138, NA), 
  ticker = c("ACI", "ACI", "ACI", "ACI", "ACI", "ACI", "ACI", "ACI", "ACI", "ACI", "ACI", "ACI", "ACI", "ACI", "ACI", "ADPL", "ADPL", "ADPL", "ADPL", "ADPL"), 
  year = c(2018, 2018, 2018, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2018, 2018, 2018, 2019, 2019), 
  quarter = c("1Y", "1Y", "1Y", "1Q", "1Q", "1Q", "1Q", "1H", "1H", "1H", "1H", "3Q", "3Q", "3Q", "3Q", "1Y", "1Y", "1Y", "1Q", "1Q"), 
  rev = c(1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L), 
  cons = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L), 
  country = "HR", 
  report_type = c("bilanca", "rdg", "nt", "bilanca", "rdg", "rdg", "nt", "bilanca", "rdg", "rdg", "nt", "bilanca", "rdg", "rdg", "nt", "bilanca", "rdg", "nt", "bilanca", "rdg"), 
  report_year = c(2018, 2018, 2018, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2018, 2018, 2018, 2019, 2019), 
  report_version = "firma_2018", 
  insurance_type = NA_character_, 
  cumulative = c(NA, 1, 1, NA, 1, NA, 1, NA, 1, NA, 1, NA, 1, NA, 1, NA, 1, 1, NA, 1), 
  quarter_date = c("2018-10-01", "2018-10-01", "2018-10-01", "2019-01-01", "2019-01-01", "2019-01-01", "2019-01-01", "2019-04-01", "2019-04-01", "2019-04-01", "2019-04-01", "2019-07-01", "2019-07-01", "2019-07-01", "2019-07-01", "2018-10-01", "2018-10-01", "2018-10-01", "2019-01-01", "2019-01-01"), 
  ttm = NA_real_, 
  annual_dummy = 0
)

如果我使用包中的dcast函数data.table,然后将其融化以返回原始样本数据,则不会得到相同的结果:

library(data.table)
colTest <- paste0("b", str_pad(1:10, 3, "left", "0"))
sample <- data.table::dcast(sample_start, ... ~ report_type, value.var = colTest)


sample_end <- data.table::melt(sample, measure = patterns(colTest), variable.name = "gfi_aop",
                                 value.name = c(colTest), na.rm = FALSE)

您可以看到两个数据帧(sample_start 和 sample end)具有不同的行数。我应该如何更改融化功能以获得与我开始时相同的 df?

标签: rdata.table

解决方案


推荐阅读