首页 > 解决方案 > rowSums error 'x' must be numeric

问题描述

I am attempting to add various column together with rowSums but I am having some issues. Here is a list of column names:

colnames(No_Low_No_Intergenic_snpeff)

"CHROM" "POS"   "REF"   "ALT"   "QUAL"  "ANN.ALLELE"    "ANN.EFFECT"
"ANN.IMPACT"    "ANN.GENE"  "ANN.GENEID"    "ANN.FEATURE"   "ANN.FEATUREID"
"ANN.HGVS_C"    "ANN.HGVS_P"    "ANN.ERRORS"    "GEN.C02141.GT" "GEN.C00611.GT"
"GEN.C00633.GT" "GEN.C00634.GT" "GEN.C00644.GT" "GEN.C00647.GT" "GEN.C00648.GT"
"GEN.C00649.GT" "GEN.C00650.GT" "GEN.C00653.GT" "GEN.C00655.GT" "GEN.C00656.GT"
"GEN.C00657.GT" "GEN.C00659.GT" "GEN.C00682.GT" "GEN.C00705.GT" "GEN.C00707.GT"
"GEN.C00720.GT" "GEN.C00783.GT" "GEN.C01431.GT" "GEN.C01944.GT" "GEN.C01943.GT"
"GEN.C01403.GT" "GEN.C01158.GT" "GEN.C01157.GT" "GEN.C01156.GT" "GEN.C01033.GT"
"GEN.C00736.GT" "GEN.C00639.GT" "GEN.C99686.GT"

All of the columns that I am working with are labled GEN.Cxxxxx.GT and all the values in those column range from 0-2. I am trying to sum columns 20:29 and column 45 and then put the values in a new column called controls:

No_Low_No_Intergenic_snpeff.scores$controls <- rowSums(No_Low_No_Intergenic_snpeff.scores[,20:29,45])

but when I try running that command I get the following error:

Error in rowSums(No_Low_No_Intergenic_snpeff.scores[, 20:29, 45]) : 'x' must be numeric

Data

str(No_Low_No_Intergenic_snpeff.scores)

'data.frame':   1000 obs. of 11 variables:
$ GEN.C00644.GT: Factor w/ 3 levels "0","1","2": 3 1 1 3 3 3 2 1 3 1 ...
$ GEN.C00647.GT: Factor w/ 3 levels "0","1","2": 3 1 3 3 2 2 2 1 2 1 ...
$ GEN.C00648.GT: Factor w/ 3 levels "0","1","2": 3 1 1 3 3 3 1 1 2 1 ...
$ GEN.C00649.GT: Factor w/ 3 levels "0","1","2": 3 1 1 3 2 2 2 1 2 1 ...
...

标签: r

解决方案


You're getting this error because the values are not numeric. Look at your output from str:

GEN.C00650.GT: Factor w/ 3 levels "0","1","2": 3 1 3 3 3 3 1 1 3 1 ... 

These are class factor, not class numeric. To work with them as numbers, you need to convert them to numbers using as.numeric

If you can import your data again:

If you can import your data from the file again, you can do so with the stringsAsFactors = FALSE argument. You should almost always use this argument, since without it, all strings (and most numbers, as you see here) will be converted in to factors creating all kinds of annoying problems until you change them back.

As of R 4.0.0, this is no longer necessary, as the default value of stringsAsFactors has been changed to FALSE. This will hopefully make this common mistake a thing of the past

Otherwise, to change from a Factor back to a Number:

Base R

The simplest way to do this is to use sapply:

rowSums(sapply(No_Low_No_Intergenic_snpeff.scores[, c(20:29, 45)],
               function(x) as.numeric(as.character(x))))

This subsets your data.frame, applies the as.numeric function to each row, and then calculates rowSums.

tidyverse

You can also use the mutate_if function from dplyr to convert all factor variables to numeric.

library(dplyr)

No_Low_No_Intergenic_snpeff.scores <- No_Low_No_Intergenic_snpeff.scores %>%
    mutate_if(is.factor, ~as.numeric(as.character(.)))

rowSums(No_Low_No_Intergenic_snpeff.scores[, c(20:29, 45)])

Alternately, you could use mutate_at to select columns by position or name. Read ?select to see all the different way you can select columns. You can even use a regular expression with matches, as below:

No_Low_No_Intergenic_snpeff.scores <- No_Low_No_Intergenic_snpeff.scores %>%
    mutate_at(vars(matches('GEN.C\\d{5}.GT')), funs(as.numeric))

This applies the function as.numeric to all columns whose names match the regular expression GEN.C\\d{5}.GT, where \\d{5} represents 5 numeric digits.


推荐阅读