首页 > 解决方案 > 如何将一组值添加到现有数据框中?

问题描述

我有一个包含三列的数据框:ID、年份、增长。最后一个包含每年以毫米为单位的增长数据。

例子:

df <- data.frame(ID=rep(c("CHC01", "CHC02", "CHC03"), each=4), 
                 year=rep(2015:2018, 3), 
                 growth=c(NA, 2.3, 2.1, 3.0, NA, NA, NA, 3.2, NA, NA, 2.1, 1.2))

在另一个数据框中,我还有其他三列:ID、missing_length、missing_years。缺失长度与测量中缺失的估计长度有关。缺失年数与df中缺失年数有关

estimate <- data.frame(ID=c("CHC01", "CHC02", "CHC03"), 
                       missing_length=c(1.0, 4.4, 3.5), 
                       missing_years=c(1,3,2))

为了计算每个缺失年份的增长,我尝试了:

missing <- rep(estimate$missing_length / estimate$missing_years, estimate$missing_years)

有谁知道如何处理这个问题?

非常感谢!

标签: rdataframevectordplyr

解决方案


我们可以做一个连接,然后replaceNA计算的值

library(dplyr)
df %>% 
   left_join(estimate) %>% 
   group_by(ID) %>% 
   transmute(year, growth  = replace(growth, is.na(growth), 
                 missing_length[1]/missing_years[1]))
# A tibble: 12 x 3
# Groups:   ID [3]
#   ID     year growth
#   <fct> <int>  <dbl>
# 1 CHC01  2015   1   
# 2 CHC01  2016   2.3 
# 3 CHC01  2017   2.1 
# 4 CHC01  2018   3   
# 5 CHC02  2015   1.47
# 6 CHC02  2016   1.47
# 7 CHC02  2017   1.47
# 8 CHC02  2018   3.2 
# 9 CHC03  2015   1.75
#10 CHC03  2016   1.75
#11 CHC03  2017   2.1 
#12 CHC03  2018   1.2 

或与coalesce

df %>%
   mutate(growth = coalesce(growth,  with(estimate, 
        setNames(missing_length/missing_years, ID))[as.character(ID)])) %>%
   as_tibble
# A tibble: 12 x 3
#   ID     year growth
#   <fct> <int>  <dbl>
# 1 CHC01  2015   1   
# 2 CHC01  2016   2.3 
# 3 CHC01  2017   2.1 
# 4 CHC01  2018   3   
# 5 CHC02  2015   1.47
# 6 CHC02  2016   1.47
# 7 CHC02  2017   1.47
# 8 CHC02  2018   3.2 
# 9 CHC03  2015   1.75
#10 CHC03  2016   1.75
#11 CHC03  2017   2.1 
#12 CHC03  2018   1.2 

或类似的选项data.table

library(data.table)
setDT(df)[estimate, growth := fcoalesce(growth, 
           missing_length/missing_years), on = .(ID)]

推荐阅读