首页 > 解决方案 > Apply a match function to add a column to a list of named dataframes

问题描述

I am trying to add a variable column to a list of dataframes based on the name of each dataframe.

The idea here is that each dataframe represents a specific test result for a particular analyte. The test results are numerical, however my analyzer will give me numbers below what it is capable of reliably reading (the limit of detection; LOD), which makes for messy nonsense in my data. I want to filter out the results that are less than the LOD, however each analyte (dataframe in the list) has a different LOD. Here's a reproducible example:

#A list of two dataframes representing analytes called "A" and "C" with results from 2 analyzers (X and Y)
df.list <- list(A=data.frame(id=c("P1", "P2", "P3", "P4"), X1=c(1,2,5, 3), Y1=c(4,5,8,6)),
                C=data.frame(id=c("P1", "P2", "P3", "P4"),  X1=c(4,NA,6, 7), Y1=c(4,NA,7,6)))


#A third dataframe (not in the list) containing the identity and LoD of each analyte
lod.input <- data.frame(Analyte = c("A","C"), LoD = c(1,10))

Now what I want to do is match the variable in the "Analyte" column in the lod.input list of dataframes to the names(df.list) identity, and output a column within the lod.input list of dataframes that has the LOD associated with that analyte.

So far, I can enter a new column listing an LOD from one of the dataframes in the list

junk <- lapply(df.list, function(x) data.frame(x, LOD = lod.input$LoD[match(names(df.list[2]), lod.input$Analyte)]))
junk
   $A
      id X Y LOD
    1 P1 1 4  10
    2 P2 2 5  10
    3 P3 5 8  10
    4 P4 3 6  10

    $C
      id  X  Y LOD
    1 P1  4  4  10
    2 P2 NA NA  10
    3 P3  6  7  10
    4 P4  7  6  10

But I can't seem to figure out how to match them or incorporate more LOD's.

junk <- lapply(df.list, function(x) data.frame(x, LOD = lod.input$LoD[match(names(x[1]), lod.input$Analyte)]))
junk

$A
  id X Y LOD
1 P1 1 4  NA
2 P2 2 5  NA
3 P3 5 8  NA
4 P4 3 6  NA

$C
  id  X  Y LOD
1 P1  4  4  NA
2 P2 NA NA  NA
3 P3  6  7  NA
4 P4  7  6  NA

I think I might need a nested lapply function, but I can't seem to figure it out. Help is appreciated!

标签: r

解决方案


根据您所拥有的,您可以遍历 data.frame 的名称:

lapply(names(df.list),function(i){
cbind(df.list[[i]],LOD=lod.input$LoD[match(i,lod.input$Analyte)])
})

正如@42- 所指出的(请参阅下面的评论),您上面的内容不起作用,因为当您通过一个列表(或任何命名向量)时,它不会将列表的名称传递给后续函数。


推荐阅读