首页 > 解决方案 > R,参数意味着不同的行数

问题描述

我在 R 中生成了一个数据框(df)(见下文)。如果我使用列“x2”而不是“x2a”来使数据框一切正常。但是,一旦我使用“x2a”而不是“x2”,就会出现错误,因为“x2a”的输入长度不同。您知道如何更改将与“x2a”列一起使用的代码吗?

“x2a”的错误消息:

Error in data.frame(Id = rep(df$Id), Noise = unlist(split_it), Start = rep(df$Start),  : 
  arguments imply differing number of rows: 3, 16

重现数据帧和错误的代码

x1 <- c("A", "B", "C")
x2 <- c("[1,3,5,6,7]","[5,7,8,9,10]","[3,4,5,8,9]")
x2a <- c("[1,3,5]","[5,7,8,9,10, 20, 30, 24]","[3,4,5,8,9]")
x3 <- c(8000, 74555, 623334)
x4 <- c(9000, 76000, 623500)

df <- data.frame(cbind(x1, x2a, x3, x4))
colnames(df) <- c("Id", "Noise", "Start", "End")
df$Start <- as.numeric(as.character(df$Start))
df$End <- as.numeric(as.character(df$End))

# remove square brackets
df$Noise <- gsub("\\[|\\]", "", df$Noise)

# split 
split_it <- strsplit(df$Noise, split = ",")
df_2 <- data.frame(Id = rep(df$Id), Noise = unlist(split_it), Start = rep(df$Start), End = rep(df$End))
df_2 <- df_2[order(df_2$Id),]
rownames(df_2) <- NULL

标签: rdataframe

解决方案


碱基R

我推断你想要的不是 R 可以为你“直觉”的东西:你希望它Id根据工作时找到的元素数量重复值strsplit。(R 应该如何知道查看一个对象并任意重复另一个对象?)

尝试使用来指定(etc) 的每个元素应该重复rep(., times=.)多少次,以便与.IdNoise

# split 
split_it <- strsplit(df$Noise, split = ",")
n <- lengths(split_it)
print(n)
# [1] 3 8 5

df_2 <- data.frame(Id = rep(df$Id, times=n),
                   Noise = unlist(split_it),
                   Start = rep(df$Start, times=n),
                   End = rep(df$End, times=n))
df_2 <- df_2[order(df_2$Id),]
rownames(df_2) <- NULL
df_2
#    Id Noise  Start    End
# 1   A     1   8000   9000
# 2   A     3   8000   9000
# 3   A     5   8000   9000
# 4   B     5  74555  76000
# 5   B     7  74555  76000
# 6   B     8  74555  76000
# 7   B     9  74555  76000
# 8   B    10  74555  76000
# 9   B    20  74555  76000
# 10  B    30  74555  76000
# 11  B    24  74555  76000
# 12  C     3 623334 623500
# 13  C     4 623334 623500
# 14  C     5 623334 623500
# 15  C     8 623334 623500
# 16  C     9 623334 623500

dplyr

library(dplyr)
df %>%
  mutate(Noise = strsplit(Noise, split = ",")) %>%
  unnest(Noise) %>%
  mutate(Noise = as.integer(Noise))   # I'm inferring this is desired, not required
# # A tibble: 16 x 4
#    Id    Noise  Start    End
#    <chr> <int>  <dbl>  <dbl>
#  1 A         1   8000   9000
#  2 A         3   8000   9000
#  3 A         5   8000   9000
#  4 B         5  74555  76000
#  5 B         7  74555  76000
#  6 B         8  74555  76000
#  7 B         9  74555  76000
#  8 B        10  74555  76000
#  9 B        20  74555  76000
# 10 B        30  74555  76000
# 11 B        24  74555  76000
# 12 C         3 623334 623500
# 13 C         4 623334 623500
# 14 C         5 623334 623500
# 15 C         8 623334 623500
# 16 C         9 623334 623500

推荐阅读