r - 合并数据并将零放在 R 中不存在的缩写中
问题描述
我怎样才能以这种方式组合数据?在这个数据集中
forest=structure(list(ADR.N.14.0 = c(8140010250001, 8140010250002, 8140010250005
), Соста.C.254 = structure(c(3L, 1L, 2L), .Label = c("10WB",
"6AS 4WB", "7AS 3WB"), class = "factor"), PLSVYD.N.16.6 = c(3,
2, 36), PRBPOR.C.254 = structure(c(1L, 2L, 1L), .Label = c("AS",
"WB"), class = "factor"), NOMYAR.N.16.6 = c(1, 1, 1), KOFPOR1.N.16.6 = c(7,
10, 6), POR1.C.254 = structure(c(1L, 2L, 1L), .Label = c("AS",
"WB"), class = "factor"), VOZPOR1.N.16.6 = c(80, 45, 50), VYSPOR1.N.16.6 = c(24,
17, 19), DEMPOR1.N.16.6 = c(36, 16, 24), POLNOT1.N.16.6 = c(1,
0.9, 0.8), ZAPZAH1.N.16.6 = c(210, 160, 170), NOMYAR2.N.16.6 = c(1,
1, 1), KOFSAST2.N.16.6 = c(3, 0, 4), POR2.C.254 = structure(c(2L,
1L, 2L), .Label = c("AS", "WB"), class = "factor"), VOZPOR2.N.16.6 = c(70,
45, 40), VYSPOR2.N.16.6 = c(22, 17, 16), DEMPOR2.N.16.6 = c(26,
22, 16), POLNOT2.N.16.6 = c(0, 0, 0), ZAPZAH2.N.16.6 = c(0, 0,
0), NOMYAR3.N.16.6 = c(1, 0, 0), KOFSAST3.N.16.6 = c(0, 0, 0),
POR3.C.254 = structure(c(2L, 1L, 1L), .Label = c("", "Д"), class = "factor"),
VOZPOR3.N.16.6 = c(140, 0, 0), VYSPOR3.N.16.6 = c(20, 0,
0), DEMPOR3.N.16.6 = c(40, 0, 0), POLNOT3.N.16.6 = c(0, 0,
0), ZAPZAH3.N.16.6 = c(0, 0, 0), NOMYAR4.N.16.6 = c(1, 0,
0), KOFSAST4.N.16.6 = c(0, 0, 0), POR4.C.254 = structure(c(2L,
1L, 1L), .Label = c("", "ЛИП"), class = "factor"), VOZPOR4.N.16.6 = c(130,
0, 0), VYSPOR4.N.16.6 = c(20, 0, 0), DEMPOR4.N.16.6 = c(36,
0, 0), POLNOT4.N.16.6 = c(0, 0, 0), ZAPZAH4.N.16.6 = c(0,
0, 0), KOFSAST5.N.16.6 = c(0L, NA, NA), POR5.C.255 = structure(c(2L,
1L, 1L), .Label = c("", "oak"), class = "factor"), VOZPOR5.N.16.6 = c(0L,
NA, NA), VYSPOR5.N.16.6 = c(0L, NA, NA), DEMPOR5.N.16.6 = c(0L,
NA, NA), POLNOT5.N.16.6 = c(0L, NA, NA), ZAPZAH5.N.16.6 = c(0L,
NA, NA)), class = "data.frame", row.names = c(NA, -3L))
例如,在某些变量中 Соста,C,254;PRBPOR,C,254
有缩写,如AS
,WD
这里树字典,它包含这些缩写的含义
tree_dict=structure(list(AS = structure(1L, .Label = "WB", class = "factor"),
aspen = structure(1L, .Label = "warty birch", class = "factor")), class = "data.frame", row.names = c(NA,
-1L))
但缩写列表可能很长。例如
td1=structure(list(О = structure(1:2, .Label = c("H", "M"), class = "factor"),
Oak = structure(1:2, .Label = c("Hornbeam", "Maple"), class = "factor")), class = "data.frame", row.names = c(NA,
-2L))
forest
对于这些变量,如何在数据帧的每一行中
KOFPOR,N,16,6
POR,C,254
VOZPOR,N,16,6
VYSPOR,N,16,6
DEMPOR,N,16,6
POLNOT,N,16,6
ZAPZAH,N,16,6
对于此行中没有但输入tree_dict
为零的每个缩写词?
并输入下一个编号(在此数据示例中,前缀从 1 到 4),例如对于橡木,它将是
KOFPOR5,N,16,6
POR5,C,254
VOZPOR5,N,16,6
VYSPOR5,N,16,6
DEMPOR5,N,16,6
POLNOT5,N,16,6
ZAPZAH5,N,16,6
并在变量中POR, C, 254
设置值橡木,POR5, C, 254
即将被放置oak
,并且在它们指示的任何列中的任何缩写都会更改为真实姓名tree_dict
例如
7AS 3WB
7 aspin ,3 warty birch
所以橡木的理想outout应该是
output=structure(list(Соста.C.254 = structure(1L, .Label = "7Aspen 3warty birch", class = "factor"),
PLSVYD.N.16.6 = 3L, PRBPOR.C.254 = structure(1L, .Label = "Aspen", class = "factor"),
NOMYAR.N.16.6 = 1L, KOFPOR1.N.16.6 = 7L, POR1.C.254 = structure(1L, .Label = "Aspen", class = "factor"),
VOZPOR1.N.16.6 = 80L, VYSPOR1.N.16.6 = 24L, DEMPOR1.N.16.6 = 36L,
POLNOT1.N.16.6 = 1L, ZAPZAH1.N.16.6 = 210L, NOMYAR2.N.16.6 = 1L,
KOFSOCT2.N.16.6 = 3L, POR2.C.254 = structure(1L, .Label = "warty birch", class = "factor"),
VOZPOR2.N.16.6 = 70L, VYSPOR2.N.16.6 = 22L, DEMPOR2.N.16.6 = 26L,
POLNOT2.N.16.6 = 0L, ZAPZAH2.N.16.6 = 0L, NOMYAR3.N.16.6 = 1L,
KOFSOCT3.N.16.6 = 0L, POR3.C.254 = structure(1L, .Label = "elm", class = "factor"),
VOZPOR3.N.16.6 = 140L, VYSPOR3.N.16.6 = 20L, DEMPOR3.N.16.6 = 40L,
POLNOT3.N.16.6 = 0L, ZAPZAH3.N.16.6 = 0L, NOMYAR4.N.16.6 = 1L,
KOFSOCT4.N.16.6 = 0L, POR4.C.254 = structure(1L, .Label = "Linden", class = "factor"),
VOZPOR4.N.16.6 = 130L, VYSPOR4.N.16.6 = 20L, DEMPOR4.N.16.6 = 36L,
POLNOT4.N.16.6 = 0L, ZAPZAH4.N.16.6 = 0L, NOMYAR5.N.16.6 = 1L,
KOFSOCT5.N.16.6 = 0L, POR5.C.255 = structure(1L, .Label = "oak", class = "factor"),
VOZPOR5.N.16.6 = 0L, VYSPOR5.N.16.6 = 0L, DEMPOR5.N.16.6 = 0L,
POLNOT5.N.16.6 = 0L, ZAPZAH5.N.16.6 = 0L), class = "data.frame", row.names = c(NA,
-1L))
和 formaple
将是第六个
KOFPOR6,N,16,6
POR6,C,254
VOZPOR6,N,16,6
VYSPOR6,N,16,6
DEMPOR6,N,16,6
POLNOT6,N,16,6
ZAPZAH6,N,16,6
如何进行如此高难度的组合?
解决方案
我不确定我是否理解您的所有帖子,尤其是关于maple
. 此外,您tree_dict
只是部分内容,并未列出您给出的示例中的“elm”或“Linden” output
。但是,根据您的数据和这个相同的output
示例,以下是一些至少在某种程度上可以帮助您的编码:
install.packages("data.table")
install.packages("hash")
TD <- data.frame(tree_dict)
# Your tree_dict structure is not ideally conditioned. Names look like data
# that are part of the translation hash. So we must integrate them as row data
# not just name labels, and row-bind:
TD0 <- data.frame(list(AS="AS", aspen="aspen"))
TD <- rbind(TD0, TD)
# Using hashes (giving up on table merges as your strings
# may contain several translation tokens at a time)
h <- hash::hash(TD[[1]], TD[[2]])
forest<-data.table::as.data.table(forest)
g <- function(y) { for (x in keys(h)) y <- gsub(x, h[[x]], y); y; }
# Now for the expected output, just apply g column-wise:
forest[, lapply(.SD, g)]
# Your structure `output`is the first line of the resulting table, the following
# ones should be OK if using the complete version of `tree_dict`, which
# is cut-down in your post.
推荐阅读
- node.js - Stripe + Plaid 目前不适用于 .Net
- c# - C#:在没有循环的情况下访问 MultiValueDictionary 中的元素
- python - 如何使用 Django 中的提交按钮向数据库添加内容?
- node.js - 防止屏幕录制
- mysql - 在 SQL 选择查询中更改时间格式
- r - 试图建立一个模型,将一个国家的 GDP 增长与石油和天然气的价格和产量联系起来
- excel - 无法从数据透视图中删除总线
- angular - TinyMCE 未在 Angular 上显示
- scala - 如何将模块依赖关系与其父模块的依赖关系与 Mill 结合起来
- c++ - 如何使用 C++ 中的给定伪代码创建一个递归的可变参数函数?