r - Apply a match function to add a column to a list of named dataframes
问题描述
I am trying to add a variable column to a list of dataframes based on the name of each dataframe.
The idea here is that each dataframe represents a specific test result for a particular analyte. The test results are numerical, however my analyzer will give me numbers below what it is capable of reliably reading (the limit of detection; LOD), which makes for messy nonsense in my data. I want to filter out the results that are less than the LOD, however each analyte (dataframe in the list) has a different LOD. Here's a reproducible example:
#A list of two dataframes representing analytes called "A" and "C" with results from 2 analyzers (X and Y)
df.list <- list(A=data.frame(id=c("P1", "P2", "P3", "P4"), X1=c(1,2,5, 3), Y1=c(4,5,8,6)),
C=data.frame(id=c("P1", "P2", "P3", "P4"), X1=c(4,NA,6, 7), Y1=c(4,NA,7,6)))
#A third dataframe (not in the list) containing the identity and LoD of each analyte
lod.input <- data.frame(Analyte = c("A","C"), LoD = c(1,10))
Now what I want to do is match the variable in the "Analyte" column in the lod.input
list of dataframes to the names(df.list)
identity, and output a column within the lod.input
list of dataframes that has the LOD associated with that analyte.
So far, I can enter a new column listing an LOD from one of the dataframes in the list
junk <- lapply(df.list, function(x) data.frame(x, LOD = lod.input$LoD[match(names(df.list[2]), lod.input$Analyte)]))
junk
$A
id X Y LOD
1 P1 1 4 10
2 P2 2 5 10
3 P3 5 8 10
4 P4 3 6 10
$C
id X Y LOD
1 P1 4 4 10
2 P2 NA NA 10
3 P3 6 7 10
4 P4 7 6 10
But I can't seem to figure out how to match them or incorporate more LOD's.
junk <- lapply(df.list, function(x) data.frame(x, LOD = lod.input$LoD[match(names(x[1]), lod.input$Analyte)]))
junk
$A
id X Y LOD
1 P1 1 4 NA
2 P2 2 5 NA
3 P3 5 8 NA
4 P4 3 6 NA
$C
id X Y LOD
1 P1 4 4 NA
2 P2 NA NA NA
3 P3 6 7 NA
4 P4 7 6 NA
I think I might need a nested lapply
function, but I can't seem to figure it out. Help is appreciated!
解决方案
根据您所拥有的,您可以遍历 data.frame 的名称:
lapply(names(df.list),function(i){
cbind(df.list[[i]],LOD=lod.input$LoD[match(i,lod.input$Analyte)])
})
正如@42- 所指出的(请参阅下面的评论),您上面的内容不起作用,因为当您通过一个列表(或任何命名向量)时,它不会将列表的名称传递给后续函数。