r - rowSums error 'x' must be numeric
问题描述
I am attempting to add various column together with rowSums
but I am having some issues. Here is a list of column names:
colnames(No_Low_No_Intergenic_snpeff)
"CHROM" "POS" "REF" "ALT" "QUAL" "ANN.ALLELE" "ANN.EFFECT"
"ANN.IMPACT" "ANN.GENE" "ANN.GENEID" "ANN.FEATURE" "ANN.FEATUREID"
"ANN.HGVS_C" "ANN.HGVS_P" "ANN.ERRORS" "GEN.C02141.GT" "GEN.C00611.GT"
"GEN.C00633.GT" "GEN.C00634.GT" "GEN.C00644.GT" "GEN.C00647.GT" "GEN.C00648.GT"
"GEN.C00649.GT" "GEN.C00650.GT" "GEN.C00653.GT" "GEN.C00655.GT" "GEN.C00656.GT"
"GEN.C00657.GT" "GEN.C00659.GT" "GEN.C00682.GT" "GEN.C00705.GT" "GEN.C00707.GT"
"GEN.C00720.GT" "GEN.C00783.GT" "GEN.C01431.GT" "GEN.C01944.GT" "GEN.C01943.GT"
"GEN.C01403.GT" "GEN.C01158.GT" "GEN.C01157.GT" "GEN.C01156.GT" "GEN.C01033.GT"
"GEN.C00736.GT" "GEN.C00639.GT" "GEN.C99686.GT"
All of the columns that I am working with are labled GEN.Cxxxxx.GT
and all the values in those column range from 0-2. I am trying to sum columns 20:29 and column 45 and then put the values in a new column called controls
:
No_Low_No_Intergenic_snpeff.scores$controls <- rowSums(No_Low_No_Intergenic_snpeff.scores[,20:29,45])
but when I try running that command I get the following error:
Error in rowSums(No_Low_No_Intergenic_snpeff.scores[, 20:29, 45]) : 'x' must be numeric
Data
str(No_Low_No_Intergenic_snpeff.scores)
'data.frame': 1000 obs. of 11 variables:
$ GEN.C00644.GT: Factor w/ 3 levels "0","1","2": 3 1 1 3 3 3 2 1 3 1 ...
$ GEN.C00647.GT: Factor w/ 3 levels "0","1","2": 3 1 3 3 2 2 2 1 2 1 ...
$ GEN.C00648.GT: Factor w/ 3 levels "0","1","2": 3 1 1 3 3 3 1 1 2 1 ...
$ GEN.C00649.GT: Factor w/ 3 levels "0","1","2": 3 1 1 3 2 2 2 1 2 1 ...
...
解决方案
You're getting this error because the values are not numeric
. Look at your output from str
:
GEN.C00650.GT: Factor w/ 3 levels "0","1","2": 3 1 3 3 3 3 1 1 3 1 ...
These are class factor
, not class numeric
. To work with them as numbers, you need to convert them to numbers using as.numeric
If you can import your data again:
If you can import your data from the file again, you can do so with the stringsAsFactors = FALSE
argument. You should almost always use this argument, since without it, all strings (and most numbers, as you see here) will be converted in to factors creating all kinds of annoying problems until you change them back.
As of R 4.0.0, this is no longer necessary, as the default value of stringsAsFactors
has been changed to FALSE
. This will hopefully make this common mistake a thing of the past
Otherwise, to change from a Factor back to a Number:
Base R
The simplest way to do this is to use sapply
:
rowSums(sapply(No_Low_No_Intergenic_snpeff.scores[, c(20:29, 45)],
function(x) as.numeric(as.character(x))))
This subsets your data.frame, applies the as.numeric
function to each row, and then calculates rowSums
.
tidyverse
You can also use the mutate_if
function from dplyr
to convert all factor variables to numeric.
library(dplyr)
No_Low_No_Intergenic_snpeff.scores <- No_Low_No_Intergenic_snpeff.scores %>%
mutate_if(is.factor, ~as.numeric(as.character(.)))
rowSums(No_Low_No_Intergenic_snpeff.scores[, c(20:29, 45)])
Alternately, you could use mutate_at
to select columns by position or name. Read ?select
to see all the different way you can select columns. You can even use a regular expression with matches
, as below:
No_Low_No_Intergenic_snpeff.scores <- No_Low_No_Intergenic_snpeff.scores %>%
mutate_at(vars(matches('GEN.C\\d{5}.GT')), funs(as.numeric))
This applies the function as.numeric
to all columns whose names match the regular expression GEN.C\\d{5}.GT
, where \\d{5}
represents 5 numeric digits.
推荐阅读
- python-3.x - 从带有图像的扫描pdf中提取文本?
- java - JavaFX 11 OpenJDK 模块-info.java
- math - Octave 的 power operator 的确切问题是什么?
- python-3.x - 如何制作定时功能并使用“ctx”?(不和谐.py)
- javascript - SyntaxError:意外的标记 < 在 JSON 中的位置
- java - 迭代器打印 LinkedList
- kotlin - kotlin.Char 怎么是可序列化的?(没有明确定义)
- c++ - 使用 Ubuntu VM + JUCE 图形库的非法指令(核心转储)
- node.js - 如何实现配置文件?
- windows - 使用许多命令创建 .Bat