r - 使用 dcast 然后 melt 不会产生我开始使用的表(data.table 方法)
问题描述
我有 data.frame (对不起,它看起来很丑):
sample <- data.table(
b001 = c(0, NA, 0, 0, NA, NA, 0, 0, NA, NA, 0, 0, NA, NA, 0, 0, NA, 86802296, 0, NA),
b002 = c(521566495, NA, 0, 515381816, NA, NA, 0, 502929725, NA, NA, 0, 501976304, NA, NA, 0, 1001600997, NA, 48172014, 1053789723, NA),
b003 = c(21632941, NA, 0, 24179514, NA, NA, 0, 23526136, NA, NA, 0, 23840002, NA, NA, 0, 136221414, NA, 90857983, 136974712, NA),
b004 = c(0, NA, 0, 0, NA, NA, 0, 0, NA, NA, 0, 0, NA, NA, 0, 62190678, NA, 299000, 55708960, NA),
b005 = c(21079801, NA, 0, 23467074, NA, NA, 0, 22694996, NA, NA, 0, 23082002, NA, NA, 0, 3435190, NA, 0, 3011353, NA),
b006 = c(0, NA, 0, 0, NA, NA, 0, 0, NA, NA, 0, 0, NA, NA, 0, 25431844, NA, -382404, 26127224, NA),
b007 = c(229500, NA, 0, 0, NA, NA, 0, 0, NA, NA, 0, 0, NA, NA, 0, 20327, NA, 10224057, 34791, NA),
b008 = c(323640, NA, 0, 712440, NA, NA, 0, 831140, NA, NA, 0, 758000, NA, NA, 0, 33739621, NA, 2991979, 40685611, NA),
b009 = c(0, NA, 0, 0, NA, NA, 0, 0, NA, NA, 0, 0, NA, NA, 0, 11403754, NA, 23861043, 11406773, NA),
b010 = c(499168717, NA, 0, 490437465, NA, NA, 0, 478638752, NA, NA, 0, 477371465, NA, NA, 0, 765852353, NA, -79679644, 808923138, NA),
ticker = c("ACI", "ACI", "ACI", "ACI", "ACI", "ACI", "ACI", "ACI", "ACI", "ACI", "ACI", "ACI", "ACI", "ACI", "ACI", "ADPL", "ADPL", "ADPL", "ADPL", "ADPL"),
year = c(2018, 2018, 2018, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2018, 2018, 2018, 2019, 2019),
quarter = c("1Y", "1Y", "1Y", "1Q", "1Q", "1Q", "1Q", "1H", "1H", "1H", "1H", "3Q", "3Q", "3Q", "3Q", "1Y", "1Y", "1Y", "1Q", "1Q"),
rev = c(1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L),
cons = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L),
country = "HR",
report_type = c("bilanca", "rdg", "nt", "bilanca", "rdg", "rdg", "nt", "bilanca", "rdg", "rdg", "nt", "bilanca", "rdg", "rdg", "nt", "bilanca", "rdg", "nt", "bilanca", "rdg"),
report_year = c(2018, 2018, 2018, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2018, 2018, 2018, 2019, 2019),
report_version = "firma_2018",
insurance_type = NA_character_,
cumulative = c(NA, 1, 1, NA, 1, NA, 1, NA, 1, NA, 1, NA, 1, NA, 1, NA, 1, 1, NA, 1),
quarter_date = c("2018-10-01", "2018-10-01", "2018-10-01", "2019-01-01", "2019-01-01", "2019-01-01", "2019-01-01", "2019-04-01", "2019-04-01", "2019-04-01", "2019-04-01", "2019-07-01", "2019-07-01", "2019-07-01", "2019-07-01", "2018-10-01", "2018-10-01", "2018-10-01", "2019-01-01", "2019-01-01"),
ttm = NA_real_,
annual_dummy = 0
)
如果我使用包中的dcast
函数data.table
,然后将其融化以返回原始样本数据,则不会得到相同的结果:
library(data.table)
colTest <- paste0("b", str_pad(1:10, 3, "left", "0"))
sample <- data.table::dcast(sample_start, ... ~ report_type, value.var = colTest)
sample_end <- data.table::melt(sample, measure = patterns(colTest), variable.name = "gfi_aop",
value.name = c(colTest), na.rm = FALSE)
您可以看到两个数据帧(sample_start 和 sample end)具有不同的行数。我应该如何更改融化功能以获得与我开始时相同的 df?
解决方案
推荐阅读
- c - 存储二维数组的奇怪行为?
- shell - CGI shell 脚本中的头文件(或包含文件)
- html - HTML 避免启动动画
- amazon-web-services - 如何在 node js/aws 环境下实现 RTSP 视频流
- python - 重命名包含模式的文件,循环遍历子文件夹
- python - 如何高效地多次修改某种字典?
- python - 为什么我的 Twitter 流给我 HTTP 错误:406?
- android - 如何在kotlin(android Studio)中实现canvasView和BottomSheetBehaviour,就像一个绘图应用程序
- algorithm - 颤振中的多词搜索
- c# - RSA非对称加密解密出现“密钥不存在”错误