首页 > 解决方案 > Subsetting Data in R not working through a vector

问题描述

Scanario.

In the datacamp courses, cleaning data with R: case studies. There is an excercise at the extreme end of the course where we have 5 columns (say: 1,2,3,4,5) of dataset "att5". Only column 1 is char & has characters in it but 2:5 has numbers but it is type(chars). They tell me to make a vector cols consisting of vectors which has indices of (2,3,4,5) and use sapply to use as.numeric function on them.

My solution is not working although it is making sense. I'm sharing my their solutions first and then my solutions. Please help me understand what is going on.

Data Camp Solution(working)

# Define vector containing numerical columns: cols
cols <- -1

# Use sapply to coerce cols to numeric
att5[, cols] <- sapply(att5[, cols], as.numeric)

My Solution(not working)

# Define vector containing numerical columns: cols
cols <- c(2:5)

# Use sapply to coerce cols to numeric
att5[, cols] <- sapply(att5[, cols], as.numeric)

I'm getting this error: invalid subscript type list

Help me understand. Newbie in R.

标签: r

解决方案


您的解决方案在我的机器上完美运行。我能看到的唯一区别cols <- -1是 class "numeric"where as cols <- c(2:5)is [1] "integer"。如果您想知道两者之间的区别,请查看R 中整数类和数字类之间的区别是什么

因此,对他们的解决方案进行逆向工程的一种方法是colsnumeric课堂上生成并且seq可以帮助做到这一点。

cols <- seq(2,5,1)
#class(cols)
#[1] "numeric"
att5[, cols] <- sapply(att5[, cols], as.numeric)
# str(att5)
# 'data.frame': 5 obs. of  5 variables:
# $ att1: Factor w/ 5 levels "A","B","C","D",..: 1 2 3 4 5
# $ att2: num  1 2 3 4 5
# $ att3: num  1 2 3 4 5
# $ att4: num  1 2 3 4 5
# $ att5: num  1 2 3 4 5

数据

dput(att5)
att5 <- structure(list(att1 = structure(1:5, .Label = c("A", "B", "C", 
"D", "E"), class = "factor"), att2 = structure(1:5, .Label = c("1", 
"2", "3", "4", "5"), class = "factor"), att3 = structure(1:5, .Label = c("1", 
"2", "3", "4", "5"), class = "factor"), att4 = structure(1:5, .Label = c("1", 
"2", "3", "4", "5"), class = "factor"), att5 = structure(1:5, .Label = c("1", 
"2", "3", "4", "5"), class = "factor")), class = "data.frame", row.names = c(NA, 
-5L))

希望它对你有用。


推荐阅读