首页 > 解决方案 > 警告消息在 `[<-.factor`(`*tmp*`, iseq, value = foo) 中:无效因子级别,尝试将向量添加到行子集时生成 NA

问题描述

我正在编写一个函数,该函数试图一次在几列中的 data.frame 的单行中添加值:

require(stringr)

addPointsToKeyRow = function(df, keyRowNum, searchStringForPointColNames, pointsVector){
  colsWithMatchingSearchResults = str_match(colnames(df), searchStringForPointColNames)
  pointColNums = (which(!is.na(colsWithMatchingSearchResults)))
  pointsVectorCleaned = pointsVector[!is.na(pointsVector)]
  print(is.vector(pointsVectorCleaned)) #Returns TRUE
  print(is.data.frame(pointsVectorCleaned)) #Returns FALSE
  print(pointsVectorCleaned)
  if(length(pointsVectorCleaned) == length(pointColNums)){
    newDf = data.frame(df, stringsAsFactors = FALSE)
    newDf[keyRowNum, pointColNums] = as.character(pointsVectorCleaned)
    #for(i in 1:length(pointColNums)){
    #  newDf[keyRowNum,pointColNums[i]]=as.character(pointsVectorCleaned[i])
    #}
    print(newDf[keyRowNum,])
  }
}

当我将函数应用于我的数据 ( addPointsToKeyRow(finalDf, which(finalDf[,1]=="key"), "points_q", pointVals)) 时,我收到以下警告:

In [<-.factor( *tmp*, iseq, value = "2") : 无效因子水平,NA 生成

我在 SO 和其他网站上查找了错误,建议似乎总是确保您的 data.frame 具有stringsAsFactors = FALSE.

我认为我的问题可能是当我对 data.frame( newDf[keyRowNum, pointColNums]) 进行子集化时,它不再保留stringsAsFactors = FALSE.

无论这是否是问题,我都非常欢迎帮助解决这个奇怪的问题。提前谢谢了!

举个例子,假设 df 是:

df = structure(list(first = structure(c(7L, 9L, 5L, 4L, 10L, 2L, 3L, 
6L, 1L, 8L), .Label = c("autumn", "spring", "summer", "winter", 
"july", "betty", "november", "echo", "victor", "tango"), class = "factor"), 
    last = structure(c(6L, 2L, 4L, 5L, 1L, 8L, 3L, 9L, 10L, 7L
    ), .Label = c("brummett1", "do", "drorbaugh", "galeno", "gerber", 
    "key", "lyons", "pecsok", "perezfranco", "swatt"), class = "factor"), 
    question1 = structure(c(1L, 1L, 1L, 4L, 6L, 2L, 5L, 3L, 5L, 
    5L), .Label = c("0", "0.25", "1:02:01", "1:2 50%", "2-Jan", 
    "50%"), class = "factor"), points_q1 = structure(c(1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "", class = "factor"), 
    question2 = structure(c(8L, 10L, 6L, 5L, 2L, 3L, 7L, 1L, 
    4L, 9L), .Label = c("        a    |     b; A|    Aa  |  Ab; b|    ab   |  bb; the possibility that the offspring will be heterozygous is about 25%. The same goes for the homozygous recessive it is a 1:1:1:1", 
    "1/4 heterozygous for \xf1a\xee and 0 recessive for \xf1b\xee", 
    "16-Mar", "2-Jan", "3:1 25%", "4-Jan", "Male=aabb Female=AAbb Heterozygous is going to be 1/2. Homozygous is going to be 1/4.", 
    "possible offspring genotypes (each with probability of 0.25): AABb AaBb AAbb Aabb. Question is asking about probability of Aabb_ which is 0.25.", 
    "The square shows Ab Ab_ Bb Bb so 50% or 1/2.  ", "Xa Yb (father) crossed with XA Xb (mother)  = 1/2 "
    ), class = "factor"), points_q2 = structure(c(1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "", class = "factor"), 
    question3 = structure(c(4L, 5L, 3L, 5L, 5L, 5L, 7L, 2L, 6L, 
    1L), .Label = c("Codominance", "coheritance", "incomplete dominance", 
    "Incomplete dominance", "Incomplete dominance ", "Incomplete dominance. ", 
    "Independent Assortment"), class = "factor"), points_q3 = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "", class = "factor"), 
    question4 = structure(c(3L, 4L, 2L, 3L, 6L, 3L, 7L, 1L, 5L, 
    4L), .Label = c("", "co-dominance", "Codominance", "Codominance ", 
    "Codominance. ", "Codominant ", "Independent Assortment? (Wrong)"
    ), class = "factor"), points_q4 = structure(c(1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "", class = "factor"), 
    question5 = structure(c(2L, 10L, 6L, 4L, 5L, 3L, 8L, 1L, 
    7L, 9L), .Label = c("      X   |    Y; X|  XX |  XY; x|  Xx  |  xY; the percentage will be 25 % or 1/4 the same applies to the son ", 
    "0 for daughter_ because male can only give non-colorblind X chromosome (because he's not colorblind an only has one X chromosome).  0.25 for both son and colorblind.", 
    "0.25", "25% for son and 25% for daughter", "25% for the son and 25% for the daughter ", 
    "4-Jan", "50%", "Father=XY Mother=X2Y Therefore_ by using the punnet square_ I was able to show/understand that the probability of them having a son AND him being colorblind is 1/4.", 
    "To have a son or daughter is 50/50.  To have a colorblind daughter is .25 whereas to have a colorblind son is .75 because it is carried on the X chromosome and the son is much more likely to inherit this because he has less x to work with", 
    "XcY (father) XC Xc (mother) Daughter is 1/4 son 1/4"), class = "factor"), 
    points_q5 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L), .Label = "", class = "factor"), question6 = structure(c(3L, 
    6L, 7L, 8L, 5L, 2L, 10L, 9L, 4L, 1L), .Label = c("Chromatids ", 
    "Chromosomes (diploids)", "homologous chromosome pairs", 
    "Homologous chromosome pairs are being separated. ", "Homologous chromosomes ", 
    "Homologous pairs ", "homologous pairs of chromosomes", "Homologus Chromosomes ", 
    "sister chromatids ", "Sister Chromatids?"), class = "factor"), 
    points_q6 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L), .Label = "", class = "factor"), question7 = structure(c(6L, 
    8L, 5L, 7L, 8L, 2L, 3L, 1L, 9L, 4L), .Label = c("", "Chromatids (haploids)", 
    "Daughter Chromosomes?", "One cell to 2", "sister chromatids", 
    "Sister chromatids", "Sister Chromatids", "Sister chromatids ", 
    "Sister chromatids within daughter cells are separating. "
    ), class = "factor"), points_q7 = structure(c(1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "", class = "factor"), 
    question8 = structure(c(1L, 4L, 1L, 2L, 4L, 2L, 3L, 6L, 5L, 
    3L), .Label = c("sister chromatids", "Sister chromatids", 
    "Sister Chromatids", "Sister chromatids ", "Sister chromatids are held together by the centromeres. In prophase chromosomes become visible. During metaphase chromosomes attach to spindles. During Anaphase the chromosomes are split apart and in telophase the cells start to create cleavage.  ", 
    "sisters chromatides"), class = "factor"), points_q8 = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "", class = "factor"), 
    question9 = structure(c(2L, 4L, 1L, 3L, 4L, 3L, 3L, 2L, 5L, 
    3L), .Label = c("prohase ", "prophase", "Prophase", "Prophase ", 
    "They condense during prophase before the rest of the phases. "
    ), class = "factor"), points_q9 = structure(c(1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "", class = "factor"), 
    question10 = structure(c(1L, 3L, 1L, 2L, 3L, 2L, 2L, 1L, 
    4L, 2L), .Label = c("anaphase", "Anaphase", "Anaphase ", 
    "During anaphase. "), class = "factor"), points_q10 = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "", class = "factor"), 
    question11 = structure(c(3L, 4L, 3L, 4L, 4L, 4L, 4L, 3L, 
    1L, 2L), .Label = c("During prophase. ", "Telephase ", "telophase", 
    "Telophase"), class = "factor"), points_q11 = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "", class = "factor"), 
    question12 = structure(c(1L, 3L, 1L, 2L, 3L, 2L, 3L, 1L, 
    4L, 2L), .Label = c("metaphase", "Metaphase", "Metaphase ", 
    "Metaphase. "), class = "factor"), points_q12 = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "", class = "factor"), 
    question13 = structure(c(1L, 4L, 1L, 4L, 2L, 4L, 2L, 5L, 
    3L, 6L), .Label = c("centromere", "Centromere", "Centromere. ", 
    "Centromeres", "centromeres ", "Cleavage"), class = "factor"), 
    points_q13 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L), .Label = "", class = "factor")), .Names = c("first", 
"last", "question1", "points_q1", "question2", "points_q2", "question3", 
"points_q3", "question4", "points_q4", "question5", "points_q5", 
"question6", "points_q6", "question7", "points_q7", "question8", 
"points_q8", "question9", "points_q9", "question10", "points_q10", 
"question11", "points_q11", "question12", "points_q12", "question13", 
"points_q13"), row.names = c(NA, -10L), class = "data.frame")

which(finalDf[,1]=="key")是 1。

pointValsc(NA, "2", "2", "2", "2", "2", "2", "2", "1", "1", "1", "1", "1", "1")

为了澄清起见,我希望决赛桌看起来像:

First    Last    question1    points_q1    question2    points_q2    etc.

key    key    0    2    "possible_offspring_genotypes..."    1    etc.

标签: rdataframe

解决方案


根据我的理解,我已经减少了你的功能,如果它给出了你想要的或者我误解了什么,请告诉我

addPointsToKeyRow = function(df, keyRowNum, searchString, pointsVector) {

    #Find columns which has searchString in it
    cols <- grepl(searchString, colnames(df))

    #Check if the columns with searchString and length of pointsVector is the same
    if (sum(cols) == length(pointsVector)) {
        #Assign the value
        df[keyRowNum,cols] <- pointsVector
    }
    #Return the updated dataframe
    df
}

#Convert all the variables in the column from factor to character
df[] <- lapply(df, as.character)

#define the values to be replaced
pointVals <- c("2", "2", "2", "2", "2", "2", "2", "1", "1", "1", "1","1", "1")

#Call the function
df <- addPointsToKeyRow(df, 1, "points_q", pointsval)

#Check the dataframe
df

推荐阅读