首页 > 解决方案 > 识别数据框变量中缺少哪个自然数并填写

问题描述

假设我有一个包含两列的数据框,一列是自然数行(最初是完整的,因此只是计算行数),另一列包含不同的文本字符串。现在想象一下,我执行了一个删除一些行的操作。我现在想要实现的是识别哪些行已经消失,并用一个替换原始数字的数字填充它们,并为以前的文本值插入 NA。我希望在查看示例时会变得清楚。感谢您提供任何帮助或提示。

    names <- c(1:12)

    posts <- c("blabla", "blubla", "wabaluba", "blap", "blub", "jibberish", "hmmm", "lol", "there", "noowwayy", "inded", "thanks")

    before <- data.frame(names, posts)

    current <- before[-c(4,7),]

    desiredoutcome <- data.frame(c(1:12), c("blabla", "blubla", "wabaluba", NA, "blub", "jibberish", NA, "lol", "there", "noowwayy", "inded", "thanks"))

标签: rdataframevectordata-science

解决方案


一个选项是使用tidyr::complete

library(dplyr)
library(tidyr)
current %>%
    mutate(names = factor(names, levels = seq(min(names), max(names)))) %>%
    complete(names)
## A tibble: 12 x 2
#   names posts
#   <fct> <fct>
# 1 1     blabla
# 2 2     blubla
# 3 3     wabaluba
# 4 4     NA
# 5 5     blub
# 6 6     jibberish
# 7 7     NA
# 8 8     lol
# 9 9     there
#10 10    noowwayy
#11 11    inded
#12 12    thanks

data.table“加入”方法

library(data.table)
setDT(current)[CJ(names = seq(min(names), max(names))), on = "names"]
#    names     posts
#1:     1    blabla
#2:     2    blubla
#3:     3  wabaluba
#4:     4      <NA>
#5:     5      blub
#6:     6 jibberish
#7:     7      <NA>
#8:     8       lol
#9:     9     there
#10:    10  noowwayy
#11:    11     inded
#12:    12    thanks

推荐阅读