r - 使用多个分组变量按组进行插值
问题描述
我正在尝试用不规则间隔测量一年半的树木生长值进行线性插值。
我想按树、块和基因型信息对木材体积组进行每日线性插值。但是,我的代码中有些东西是不对的。我尝试了参数“do”和“mutate”,但没有一个工作。有人可以帮我吗?
bio2 <- read.xlsx("Cres_biomassa.xlsx", h=T, sheetName = "Original")
str(bio2)
bio2$Block <- as.factor(bio2$Block)
bio2$Tree <- as.factor(bio2$Tree)
dput(bio2[1:10, ])
# structure(list(Date = structure(c(17537, 17593, 17628, 17656,
# 17695, 17730, 17761, 17782, 17817, 17836), class = "Date"), Block = # structure(c(1L,
# 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("1", "2", "3"
# ), class = "factor"), Gen = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
# 1L, 1L, 1L, 1L), .Label = c("G1", "G10"), class = "factor"),
# Tree = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
# ), .Label = c("1", "2"), class = "factor"), Volume = c(12.0152502828382,
# 121.168369070794, 324.280007440298, 522.317155691492, 684.262691983242,
# 742.921025749914, 775.35053835085, 804.747031488978, 996.719631625931,
# 1358.37974592578)), row.names = c(NA, 10L), class = "data.frame")
library(lubridate)
#Dates for daily interpolation:
Dates <- seq.Date(ymd("2018-01-06"), ymd("2019-04-06"), by = 1)
test1 <- bio2 %>%
group_by(Block, Gen, Tree) %>%
mutate(ApproxFun <- approxfun(x = bio2$Date, y = bio2$Volume)
LinearFit <- ApproxFun(Dates))
这个样本有两个基因型(Gen),两个树(Tree)和三个块(Blocks)。
解决方案
The main problem in your code is that approxfun()
returns a function, you can't directly store a function in a data frame. But there is a workaround: you can store your functions in a list-column of the data frame.
(also, inside mutate()
you should use =
and not <-
, you don't need to refer back to the bio2
object, and you need a comma between the two mutate statements)
What you want to do can be accomplished using nesting to subset the data by group, and using map()
to return lists.
bio2 %>%
group_by(Block, Gen, Tree) %>%
nest(data = c(Date, Volume)) %>%
mutate(ApproxFun = map(data, approxfun),
LinearFit = map2(ApproxFun, data, ~.x(.y$Date)))
We can test the result by introducing some NA values:
bio2_na <- bio2
bio2_na[c(3,7),"Volume"] <- NA_real_
bio2_na %>%
group_by(Block, Gen, Tree) %>%
nest(data = c(Date, Volume)) %>%
mutate(ApproxFun = map(data, approxfun),
LinearFit = map2(ApproxFun, data, ~.x(.y$Date))) %>%
unnest(c(data, LinearFit))
# A tibble: 10 x 7
# Groups: Block, Gen, Tree [1]
# Block Gen Tree Date Volume ApproxFun LinearFit
# <fct> <fct> <fct> <date> <dbl> <list> <dbl>
# 1 1 G1 1 2018-01-06 12.0 <fn> 12.0
# 2 1 G1 1 2018-03-03 121. <fn> 121.
# 3 1 G1 1 2018-04-07 NA <fn> 344.
# 4 1 G1 1 2018-05-05 522. <fn> 522.
# 5 1 G1 1 2018-06-13 684. <fn> 684.
# 6 1 G1 1 2018-07-18 743. <fn> 743.
# 7 1 G1 1 2018-08-18 NA <fn> 780.
# 8 1 G1 1 2018-09-08 805. <fn> 805.
# 9 1 G1 1 2018-10-13 997. <fn> 997.
#10 1 G1 1 2018-11-01 1358. <fn> 1358.
推荐阅读
- python - 最快的方式求和基于数组的部分条件
- python-3.x - VPython:为什么 scene.camera.pos 和 scene.camera.axis 永远不会改变?
- command-line - 无法在命令行上使用 zsh 作为 shell 传递 Python 列表
- excel - 加载到表时,如何让 Excel 保留电源查询中的数据类型?
- javascript - 如何在 JavaScript 测验中包含图像?
- pyqt5 - QGraphicsPixmapItem 没有正确旋转
- linux - 无法理解错误:mov rax, qword [rbp-24 - rbx * 8] 上的有效地址无效
- python - 改进搜索回文的脚本
- oracle - Oracle - 将插入/更新/删除/执行授予公共角色是一种糟糕的安全做法吗?
- arrays - 链表与数组:物理内存中哪个更连续?