首页 > 解决方案 > 用先前行中的非 NA 值替换数据框中变量的 NA 值,条件是另一个变量的值

问题描述

我有以下数据框:

weird_data <- 
  data.frame("ID" = 1:8, 
             "API" = c("01-01", 
                       "01-02", 
                       "02-01", 
                       "02-02", 
                       "02-03", 
                       "03-01", 
                       "03-02", 
                       "03-03"),  
             "Final" = c("no", 
                         "yes", 
                         "no",
                         "no", 
                         "yes", 
                         "no", 
                         "no",
                         "yes"), 
             "Month" = c("May", 
                         NA, 
                         NA, 
                         "June", 
                         "July", 
                         "April", 
                         "June",
                         NA), 
             stringsAsFactors = FALSE
  )

在该API列中,连字符前的第一个数字是井号,连字符后的第二个数字是活动代码,数字越大,对应的活动越晚。我只想保留与每口井的最新活动代码相对应的行。但是,对于某些井,Month数据仅记录早期活动代码。因此,对于每口井,如果最后一个活动代码具有NAfor Month,我想用NA记录Month了一个的最近活动代码中的替换它。理想情况下,我的输出如下所示:

desired_output <- 
  data.frame("ID" = 1:8, 
             "API" = c("01-01", 
                       "01-02", 
                       "02-01", 
                       "02-02", 
                       "02-03", 
                       "03-01", 
                       "03-02", 
                       "03-03"),  
             "Final" = c("no", 
                         "yes", 
                         "no",
                         "no", 
                         "yes", 
                         "no", 
                         "no",
                         "yes"), 
             "Month" = c("May", 
                         "May", 
                         NA, 
                         "June", 
                         "July", 
                         "April", 
                         "June",
                         "June"), 
             stringsAsFactors = FALSE
  )

井按该顺序排列,并且该Final列确实可靠地指示了我最终想要保留的井yes,如果有帮助的话。但是,实际数据大约有 8,000 行,可能是 2,800 口井。

标签: r

解决方案


这是一种使用tidyverse包的方法:

library(tidyverse)
output <- weird_data %>%
  separate(API, into = c("well", "act"), sep = "-", remove = F) %>%
  group_by(well) %>%
  fill(Month) %>%
  ungroup() %>%
  select(-well, -act)

all.equal(output, desired_output)
#[1] TRUE

推荐阅读