首页 > 解决方案 > 使用多个分组变量按组进行插值

问题描述

我正在尝试用不规则间隔测量一年半的树木生长值进行线性插值。

我想按树、块和基因型信息对木材体积组进行每日线性插值。但是,我的代码中有些东西是不对的。我尝试了参数“do”和“mutate”,但没有一个工作。有人可以帮我吗?

bio2 <- read.xlsx("Cres_biomassa.xlsx", h=T, sheetName = "Original") 
str(bio2)
bio2$Block <- as.factor(bio2$Block)
bio2$Tree <- as.factor(bio2$Tree)
dput(bio2[1:10, ])

# structure(list(Date = structure(c(17537, 17593, 17628, 17656, 
# 17695, 17730, 17761, 17782, 17817, 17836), class = "Date"), Block = # structure(c(1L, 
# 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("1", "2", "3"
# ), class = "factor"), Gen = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
# 1L, 1L, 1L, 1L), .Label = c("G1", "G10"), class = "factor"), 
#    Tree = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
#    ), .Label = c("1", "2"), class = "factor"), Volume = c(12.0152502828382, 
#    121.168369070794, 324.280007440298, 522.317155691492, 684.262691983242, 
#    742.921025749914, 775.35053835085, 804.747031488978, 996.719631625931, 
#    1358.37974592578)), row.names = c(NA, 10L), class = "data.frame")

library(lubridate)
#Dates for daily interpolation:
Dates <- seq.Date(ymd("2018-01-06"), ymd("2019-04-06"), by = 1)

test1 <- bio2 %>%
  group_by(Block, Gen, Tree) %>%
  mutate(ApproxFun <- approxfun(x = bio2$Date, y = bio2$Volume)
         LinearFit <- ApproxFun(Dates))

这个样本有两个基因型(Gen),两个树(Tree)和三个块(Blocks)。

标签: rdplyrinterpolationlinear-programming

解决方案


The main problem in your code is that approxfun() returns a function, you can't directly store a function in a data frame. But there is a workaround: you can store your functions in a list-column of the data frame.

(also, inside mutate() you should use = and not <-, you don't need to refer back to the bio2 object, and you need a comma between the two mutate statements)

What you want to do can be accomplished using nesting to subset the data by group, and using map() to return lists.

bio2 %>%
  group_by(Block, Gen, Tree) %>%
  nest(data = c(Date, Volume)) %>%
  mutate(ApproxFun = map(data, approxfun),
         LinearFit = map2(ApproxFun, data, ~.x(.y$Date)))

We can test the result by introducing some NA values:

bio2_na <- bio2
bio2_na[c(3,7),"Volume"] <- NA_real_


bio2_na %>%
  group_by(Block, Gen, Tree) %>%
  nest(data = c(Date, Volume)) %>%
  mutate(ApproxFun = map(data, approxfun),
         LinearFit = map2(ApproxFun, data, ~.x(.y$Date))) %>%
  unnest(c(data, LinearFit))

# A tibble: 10 x 7
# Groups:   Block, Gen, Tree [1]
#   Block Gen   Tree  Date       Volume ApproxFun LinearFit
#   <fct> <fct> <fct> <date>      <dbl> <list>        <dbl>
# 1 1     G1    1     2018-01-06   12.0 <fn>           12.0
# 2 1     G1    1     2018-03-03  121.  <fn>          121. 
# 3 1     G1    1     2018-04-07   NA   <fn>          344. 
# 4 1     G1    1     2018-05-05  522.  <fn>          522. 
# 5 1     G1    1     2018-06-13  684.  <fn>          684. 
# 6 1     G1    1     2018-07-18  743.  <fn>          743. 
# 7 1     G1    1     2018-08-18   NA   <fn>          780. 
# 8 1     G1    1     2018-09-08  805.  <fn>          805. 
# 9 1     G1    1     2018-10-13  997.  <fn>          997. 
#10 1     G1    1     2018-11-01 1358.  <fn>         1358. 

推荐阅读